recap product design report
TRANSCRIPT
A Real-time Captioning Solution for Class
Course
96717 - Special Topics: Technology-based Product Innovation and Enterprise Creation
Instructors
Prof. Jonathan Cagan, Prof. John Evans
Technologist
Prof. Ian Lane
Entrepreneurship Mentor
Tom Chiu
Team
Adhithi AJI, Cheng(Vanessa) LI, Fan Sai KUOK, Sanika KOKOTE
ReCap | A Realtime Captioning Solution for Class | 1
Contents 1 Introducing ReCap
1.1 What is ReCap 1.2 Stakeholders 1.3 System Structure 1.4 Core Use Case 1.5 Service and Product Use Flow
2 Designs in Detail 2.1 Industrial Design 2.2 User Interface Design 2.3 Technical Details 2.4 Cost Estimation
3 Stakeholder Testimonials 4 Value of ReCap
4.1 What Problem Is ReCap Solving? 4.2 What Makes ReCap Unique 4.3 10-Fold Improvement 4.4 Competitors 4.5 Value Opportunity Analyse 4.6 Value Opportunity Analyse in Detail 4.7 Secondary Competitors
5 About the Business 5.1 Market Size 5.2 Go to Market Strategy 5.3 Business Model 5.4 Financial 5.5 Funding Requirements
6 Project Development Process 7 Vision
7.1 ReCap as An Enterprise 7.2 Exit Strategy
Appendices
ReCap | A Realtime Captioning Solution for Class | 2
I Business canvas II Survey from Western Pennsylvania Deaf School III Stakeholder Analysis IV Things We Have Done V VOA for Phase 2 & 3 Product (previous plan)
References
ReCap | A Realtime Captioning Solution for Class | 3
1 Introducing ReCap
1.1 What is ReCap
ReCap is a combination of product and service based speech recognition that provides live
captioning for deaf and hard hearing student in class. Unlike CART, ReCap solves the
problem using modern technology instead of human labor which largely lower the cost
and enhance the using experience of live captioning service.
1.2 Stakeholders
The customers are schools and institutions, as they will purchase the service and devices
and provide them to the students.
End users are hard of hearing students and professors. We are targeting students who have
English reading ability and need more assistance besides hearing aids.
1.3 System Structure
ReCap is consisted of a mic system, which includes a portable mic and a mic base, an app
for hard of hearing students (compatible on mobile platforms like laptops, tablets and
ReCap | A Realtime Captioning Solution for Class | 4
phones) and a website for backend management which will be used by the officers in the
disability office.
Portable Mic Mic Base Mobile App for students Website for officers
1.4 Core Use Case
Professors or conference speakers will hold the mic in hand or attach it on his cloth, and
the mic will capture the sound and transmit the sound to the mic base. Mic base can be
placed on the table, and it receives input as radio signals and converts into data packets to
transmit via wifi (or bluetooth as a backup choice when wifi is not available). Mic base acts
as the mic charger, as well as the microcontroller for converting radio signals to data
packets and modem/router.
ReCap | A Realtime Captioning Solution for Class | 5
1.5 Service and Product Use Flow
1. The disability office will purchase the service after getting approval and fund from
the university.
2. Students with hearing loss can apply the service directly by visiting disability office
in person or going online and submitting the application form student online
system.
3. Once the application is proved, the student will get an email of ReCap app
download link and instructions. He or she can login ReCap using their students ID.
4. To estimate how many devices the university will need, the officer of disability
office will login ReCap’s website, and manage student accounts, classrooms. If there
are 15 students having classes in 9 different classrooms this semester, the
university is supposed to have at least 9 mic/mic base sets to be installed.
5. When professors enter the classroom, all he needs to do is to pick up the mic and
speak with it.
ReCap | A Realtime Captioning Solution for Class | 6
2 Designs in Detail
2.1 Industrial Design
The mic and mic base shell will be made with ABS and aluminum, the dimension of the
mic is 0.8 x 1.4 x 0.4 inches and the dimension of the mic base is 3 x 2 x 1 inches. The
base is designed to be a signal hub and ease-accessible recharger so the mic can be
designed in the minimal size and weight. The high-fidelity mic can provide a reliable audio
source for speech recognition to have a high accuracy result.
2.2 User Interface Design
ReCap | A Realtime Captioning Solution for Class | 7
After the voice of the speaker with the mic was captured and transmitted to the mic base,
the radio signal will be transformed into bluetooth and wifi signal and pushed to students’
devices.
The core function of the user interface is to show the caption as clear as possible. In the
captioning mode the whole screen will be used to show the captions, and other interface
elements will be hidden until user interacts with(tap on) the screen.
Further design and development can be adding more features related to classroom
scenario, such as note taking.
2.3 Technical Details
Mic, mic base and how signals are transmitted - The speaker speaks into the mic which is
converted into audio signals. Ideally, these are then converted into radio signals and sent
to a transmitter which is then received by the speakers. However, in our design, we would
ReCap | A Realtime Captioning Solution for Class | 8
radio signals are instead transmitted to the mic base where it is converted to data packets
using the in house GPUs and CPUs. These data packets are then sent via wifi or bluetooth
to the end user’s phone/tablet/laptop. The data can be accessed via an mobile or web app
with a secure login.
Integration of speech recognition technology - The speech recognition technology will
applied be on student’s device, so its requirement of CPU and GPU will be fulfilled on
student’s laptops/tablets/phones. As long as the devices are able to receive the data
package, they can transcribe the voice in real time.
2.4 Cost Estimation
Hardware: One set of hardware will approximately cost $31 per piece. We estimate to produce 10,000 sets of mic and mic base for the first round. Including the molding tools costs, the shells will cost around $6 per each. As the market demands increases and the manufacture technique become mature, the cost will be lower. The additional electronics components are a microcontroller and a modem chip which would cost roughly around $25 per piece. Service: The main cost of the service is provide reliable web servers. Due to the nature of the service we are providing, the load will not be too heavy so the cost will be fairly low.
ReCap | A Realtime Captioning Solution for Class | 9
3 Stakeholder Testimonials
“It’ll be very useful if the hearing impaired person is not with an interpreter.”
“We can’t wait to try it out. Do let me know when you get the prototype ready.”
- Sally, Western Pennsylvania School for the Deaf
ReCap | A Realtime Captioning Solution for Class | 10
“The price will make it very competitive.”
“I like the feeling of independence and confidence. I can imagine how helpful it will be.”
- Maria, One of the hard of hearing students in Carnegie Mellon University
“Universities are always monitoring new technology. We are willing to evaluate the new
device.”
- Lawrence, Disability Office of Carnegie Mellon University
Overall, we got very positive feedback from the stakeholders, and we’d love to keep in
touch with them and update our progress with them.
ReCap | A Realtime Captioning Solution for Class | 11
4 Value of ReCap
4.1 What Problem Is ReCap Solving?
Global population with hearing disability is 360 million (5% of the total population). Which
means, for every 100 people, there are five people are isolated from the rest. Hearing loss
has a huge negative impact on the communication quality.
Hearing impaired children can’t go to public school and receive the same education with
other children. Later when they choose to go to the mainstream schools, they cannot make
friends because it’s difficult for them to understand each other. The most common result is
that they only talk with their interpreters and finally give up the attempt of entering
hearing society. Many of them, for the whole life, are restricted in the hearing impaired
community, and can’t enjoy the music, lecture, movie, radio as other people do.
For all of the hearing impaired people, choosing a suitable device can be hard. A hearing
aid, is an electroacoustic device which is designed to amplify sound for the wearer, usually
with the aim of making speech more intelligible. Cochlear implants may help provide
hearing in patients who are deaf because of damage to sensory hair cells in their cochleas.
In those patients, the implants often can enable sufficient hearing for better understanding
of speech. But the quality of sound is different from natural hearing, with less sound
information being received and processed by the brain. While these product do help
making speech comprehensible, they are suboptimal solutions in noisy situations. The deaf
aid amplifies the noise along with the speech and the decoding quality is of the cochlear
implant is not very good.
ReCap | A Realtime Captioning Solution for Class | 12
Besides the sound quality, other issues like being chunky, expensive, not convenient or of
high risk also bother the hearing impaired people. For example, the Cochlear Implant
needs surgery (which may bring risks and suffering), high cost (different state can cover
different portion), not easy to keep (young children can lose it when they are playing or
taking a shower), and not good-looking when people wear it.
Talking about classroom environment, now many universities are using CART
(Communication Access Realtime Translation), which is sending an interpreter to be with
the student and type on a special keyboard, so that students can read the text on the
computer display. It has a lot of drawbacks, including: being difficult to schedule a time
with the interpreter, expensive charge, feeling awkward, preventing deaf student from
communicating with other classmates, etc.
More introductions of competitors can be found in “Competitors” section.
To conclude, hearing impaired people (especially students) are lack of a cheap, easy and
reliable way to help with communication. Our product is here to provide the way.
Apart from the end users, we are also giving considerable value to our immediate
customers who are the educational institutions in terms of giving a superior service for
their students at a much lower cost that is not possible with existing solutions. Moreover,
the professors who are important stakeholder have complete control over the content that
is being delivered.
4.2 What Makes ReCap Unique
ReCap is a live transcribing device that has been optimized for the classroom setting.
Students can place the phone/tablet in front of them, and read the real-time transcription
on the screen.
ReCap | A Realtime Captioning Solution for Class | 13
With the product, hearing impaired students can easily capture the content of the lecture
even if they don’t have other devices to help them to hear. If a student already has
Cochlear Implant or Hearing Aids, as a supplement, ReCap is helpful for them to adapt to
different accents, and help them to hear when ambiance noise is annoying.
ReCap will be designed exclusively for classroom/seminar scenarios, so we’ll provide some
features such as note taking or annotation on audio, so that students can better review the
lecture. University policy will not against recording in the classroom, because CART
already makes the text accessible to the deaf students.
4.3 10-Fold Improvement
The service of CART used currently in classrooms to aid hard of hearing students with
transcription costs around 100$/hour, so taking an average of 20 hours of classes in a
week, it comes up to 2000$ a week i.e $8000 a month. This is a tremendous expense for
the student or even the school to bear. Our technology on the other hand costs $200 per
month per student , which is a gigantic leap from what is being used currently. We also
offer packages for multiple students for schools having more number of students who
might require transcription in class thereby drastically decreasing the expense borne by
these institutions.
Our product requires minimal maintenance and does not depend on any individual, hence
there are no scheduling conflicts, which is one of the major problems faced by students
who use CART now as they cannot spontaneously attend any class whose time slot gets
shifted as they have to inform CART and find out if a captionist is available in that time
slot.
4.4 Competitors
As we look to address the of the hearing impaired individual in a classroom (academic)
scenario, our main competitor is one of the currently used tools, which is CART.
ReCap | A Realtime Captioning Solution for Class | 14
The CART is basically a service provided by the Communication Access Realtime
Translation Inc where a person, “captioner” accompanies the hearing impaired person to
particular events such as classrooms, conferences and meetings and manually transcribes
the content. Currently CART can cost from a minimum fee ranging from $225 to $300 for
the first two hours or any part thereof; $75 to $100 per hour.
Our key competitive advantages over CART would be:
● Independence - A very highly valued attribute by an hearing impaired individual.
● Power - Gives the hearing impaired individual power, confidence and flexibility.
● Cost - As we eliminate a human factor the cost incurred by the individual or the
institute reduces by a great extent.
ReCap | A Realtime Captioning Solution for Class | 15
ReCap CART
Price Around $200/month per student Around $100/h per student
Weight Less than 1 lb 40 pounds +
Skill Requirement None Experienced Captionist
Speed 5x (People read at 250 words/min) 140 words/min
4.5 Value Opportunity Analyse
We had the opportunity to interview a hearing impaired person in order to get inputs for
the VOA. Our end user is currently pursuing her MBA in Tepper School of business. She
currently has a cochlear implant on her left ear and uses the CART service to attend
classes. The university currently pays for the CART service.
The VOA developed below is a direct reflection of our end user’s thoughts/aspirations from
our product. We are currently interviewing more people from the hearing impaired
community through facebook groups with an intention of including our end user in the
design process of our product in order to delivery value that is substantial for our
customer.
Value Sub-value Product Low Medium High
Emotion
– Sense of adventure CART
Our product
– Feel of independence CART
Our product
ReCap | A Realtime Captioning Solution for Class | 16
– Sense of security CART
Our product
– Sensuality CART
Our product
– Confidence CART
Our product
– Power CART
Our product
Aesthetics
– Visual CART NA
Our product
– Tactile CART NA
Our product
– Auditory CART NA
Our product NA
– Olfactory CART NA
Our product NA
– Gustatory CART NA
Our product NA
Impact
– Social CART
Our product
– Environmental CART
Our product
Identity
– Personality CART
Our product
ReCap | A Realtime Captioning Solution for Class | 17
– Point in time CART
Our product
– Sense of place CART
Our product
Ergonomics
– Ease of use CART
Our product
– Safety CART NA
Our product
– Comfort CART
Our product
Core Technology
– Reliable CART
Our product
– Enabling CART
Our product
Quality
Craftsmanship CART NA
Our product
Durability CART NA
Our product
4.6 Value Opportunity Analyse in Detail
Emotion
- Sense of adventure
ReCap | A Realtime Captioning Solution for Class | 18
Sense of adventure is the ability of a product or service which enables the individual to be
adventurous and impulsive. It discards hesitation in trying out new scenarios and
experiences.
As the CART service requires involvement of an additional person, the sense of adventure
is very low. It does not provide any flexibility to choose different situations. The CART
service does not encourage individuals to be impulsive and spontaneous.
Our technology ReCap on the other hand offers a better sense of adventure as it can be
used in any scenario as long as a compatible electronic gadget is present. It gives the
individual flexibility and confidence to engage in different experiences and be
spontaneous.
- Feel of Independence
The CART service offers a very low feel of independence as it involves another person
whom the hearing impaired individual is dependent on at all times during the class. If in
case the service is unable to allocate a transcriber the individual will be in an awkward
situation as they were completely dependent on it.
Our technology offers a very high feel of independence as it can be used in any situation
and does not require the speaker to be visible. It can also work with multiple individuals
talking by just using a smartphone or a tablet, it also does not require an internet
connection, hence offering a feel of independence to the user. Also one of the key insights
we found was that hearing impaired individuals don’t like to stand out in class because of
the interpreter’s presence. So on this factor our technology gives very high satisfaction.
- Sense of Security
ReCap | A Realtime Captioning Solution for Class | 19
The CART service ranks medium for the sense of security aspect. The CART service involves
a third party, hence the sense of security gets diminished.
Our technology can be accessed on the individual’s personal electronic gadget like a
smartphone or a tablet, security settings of which can be entirely controlled by the
individual. Also the technology does not require an internet connection, hence all
transcribed conversation remains on the device and is not visible to anyone who cannot
access the gadget. Hence, our technology induces a high sense of security.
- Confidence
The CART service instills a moderate amount of confidence for the hearing impaired
individual as they can understand what is being discussed in class. But it does not give
enough confidence for the individual to actually participate in the discussion as there is
considerable time lag in the spoken word and the user reading it on the screen.
On the other hand our technology gives the user high confidence as they do not depend on
another person or internet connection and also as the transcribing is real time giving the
individual confidence to participate in the conversations or discussions.
- Power
The CART service gives a low sense of power to the user as he/she is dependent on the
transcriber for all the content being said in class. The user doesn’t feel very empowered
with just the CART service.
Our technology endows a very high sense of power to the user as they can understand and
participate in discussion by just using their phone, tablet or laptop, which is anyways used
ReCap | A Realtime Captioning Solution for Class | 20
by many students to take notes and hence they can feel as a partaker in the class.
Aesthetics
- Visual
Not Applicable to the CART service as it does not posses any aesthetic aspects.
Our technology ranks high from the visual aesthetic point of view. It involves use of
everyday electronic gadgets and hence looks very familiar. Also the user interface is
designed to be very convenient and appealing visually.
- Tactile
Not Applicable to the CART service as it does not posses any aesthetic aspects.
Our technology is encompassed within a device like a smartphone, tablet or a laptop,
hence has the same tactile features as the host device. Hence, it has been ranked medium
on this parameter.
Impact
- Social
The CART service has a low social impact, because it is accessible to the individual only in
a classroom scenario. The service is not available for any situations not related to
academic lectures, also sometimes not available for conferences and lectures outside of
regular classes.
ReCap | A Realtime Captioning Solution for Class | 21
Our Technology ranks high on the social impact because it can be used in any
environment. It gives the hearing impaired individual incentive to confidently engage in
social settings. Also as it does not require much additional hardware other than the
commonly used electronic gadgets and a small mic system ,so it merges very well in a
social situation giving a feeling of normalcy to the user.
- Environment
The CART service has a heavy, bulky specially designed keyboard used by the interpreter,
which has a negative impact on the environment.
Our technology is an app on an electronic device and a small mic system hence the
environmental disintegration because of it is very limited and thus, we ranked it medium
from the environmental impact point of view.
Identity
- Personality
Identity is an expression and conception of a person. The clothes he/she wears, computer
he/she used, bags he/she owns, are all factors that form and influence an identity. As a
hearing impaired person, the media to communicate is a crucial part of his/her identity.
The physical shape, sound, ways to use will affect other people and society’s perception of
him/her.
For the personality, because CART service requires a company wherever hearing impaired
people goes, it’s difficult to imagine him/her as an energetic, independent, well-rounded
person with a high level of social involvement and sense of security. Take Maria as an
example, since another person is assigned to be around her when she is taking the class,
ReCap | A Realtime Captioning Solution for Class | 22
and she needs to pay attention to the transcriber all the time, her communication with her
classmates is blocked to some extent, which makes her feels isolated and vulnerable, and
makes her classmates think she is not easy to get along with. So CART service should get a
very low score in reflecting the personality.
As for our product, since it will be used on smartphones, it makes people think that the
user knows technology well, and is open to trying out new things. According to the
statistics of Nielsen revealed earlier this year, two thirds of Americans have smartphones,
and 80% of people between 18-24 years old use smartphones. Using smartphones is a
good sign of keeping up with trend and not being isolated and obsoleted. And compared to
having someone sit next to you and type for you, or putting some devices in your ear all
day, it’s not awkward at all to put smartphones on the desk in class.
- Point in time
In the time of digitalization, CART still transcribes by human, so it got a low score.
Our product has the perfect timing. Huge amount of people are using smartphones, and
relying on apps, our app should be one of the best solutions at this time.
- Sense of place
In classrooms, or any other places, the listeners should be students or other audiences, but
CART is not. In this case, the sense of place is low for CART service.
Our product has a good sense of place, because it’s natural to use, easy to quit using, and
it doesn’t suggest any difference between the hearing impaired people and hearing people.
Ergonomics
ReCap | A Realtime Captioning Solution for Class | 23
- Easy to use
CART service is not easy to use. People have to adjust schedule with the transcriber, and if
something changes without notice, it’ll be difficult to coordinate with CART person. Based
on Maria’s experience, CART only helps her in class, so if she wants to attend a seminar or
conference, it’ll be hard to ask CART to go together. Though the quality is fine, according
to Maria’s description, we still consider it should have a low score about usability.
Our product relies on a smartphone, but doesn’t need internet connection, so it will be
very straightforward to use. Since most of people carry their phones every day, they can
easily open the app and begin to use our product. Also, we will design an easy to use
interface which will not require much learning time. Hence, we gave our product the full
grade on this evaluation.
- Safety
CART service doesn’t related to safety, so it’s not applicable.
Our product is, as far as we see, the safest solution to the people who’re suffering from
hearing loss. All we require is smartphones or tablets or laptop as the physical entity to
store and run the app, and smartphones have to pass multiple regulations to ensure the
safety of using, so we can guarantee the safety of using our app.
- Comfort
Having someone besides you is not always comfortable, especially the one is acting like a
“tool”. It feels awkward to get along with others with such a relationship, like the
transcriber is just a machine and is just for completing some tasks. As Maria said, it needs
ReCap | A Realtime Captioning Solution for Class | 24
some time to get used to CART, and having CART is somehow negatively affect the
relationship between her and classmates.
Our app did a great job when we evaluated the level of comfort. Compared to CART,
hearing impaired people can use a phone, tablet or computer to read the text silently,
without disturbing anyone. Compared to Hearing aids, they do not need to stand the
amplified noise, or try hard to discern the voice among noise.
Core technology
- Reliable
The reliability of CART is low. CART’s technology realization depends on the transcribers,
which is a human being. In common sense, human beings is more reliable than machine in
understanding nature languages. However, this depends on the professional ability and
can be varied from different transcribers. Transcriber of CART needs to be trained for a
long time to perform in a high reliable level. And the accuracy of result might be
influenced by many factors, such as the disturbing of the environment noise, the
transcriber’s understanding of the speech topic, typos, and even the physiological
conditions of the transcriber. The influence can be reduced as the professional ability and
experience of the transcriber enhanced, but it cannot be eliminated.
Our app uses a high accuracy, high speed speech recognition technology. The stability of
of the system is high and will not effect by other factors.
- Enabling
CART ranks medium in core technology enabling.
ReCap | A Realtime Captioning Solution for Class | 25
CART is based on manual transcribing and no much hard technology is included, the only
factor need to be considered is the transcribers’ availability, such as time slot and
geolocation, which is not always stable.
The speech recognition technology, our product based on, is an emerging technology and
flourishing. The technology has entered the market and being widely used for several
years. More and more research and development are focusing on this technology so it will
get faster and more accurate in the coming years, which is also our advantage.
Quality
- Craftsmanship
CART is a manual transcribing service so the craftsmanship is not applicable.
As for our product, the craftsmanship is the interaction design and the interface design or
the app. It is much easier for an emerging company to have a have craftsmanship software
product than a high craftsmanship hardware product. Once our product is well designed
and developed, it can be infinitely reproduced to every user’s device, so the craftsmanship
of our product can be high.
- Durability
CART is a manual transcribing service so the durability is not applicable.
Our product’s durability is high because it is a pure software solution so the information’s
can easily transferred from device to device. Users can change to any device but keep
using their own settings in their own account without feeling any difference. This provides
good user experience and results in high durability.
ReCap | A Realtime Captioning Solution for Class | 26
Conclusion
ReCap provides a good solution in communicating with the society for people who has
hearing disabilities and difficulties. Moreover, by providing instant transcribing in every
mobile platform, our product will make the daily life easier for not only hearing disability
users but also for international students, international conferences and so on. In the value
opportunity analyse, our product exceeds existing products in the market in every value
opportunity, especially in emotion and ergonomics. With the readied technology and well
execution, our product will provide extra values for the target users and obtain success in
the market.
4.7 Secondary Competitors
Apart from CART we do have some indirect competitors who use speech recognition
technology, but in a different market space.
Transcense: An Android app using Speech recognition to convert speech to text in group
conversations. The app connects to several phones and activates their mics to capture
what everyone's saying, then it uses voice recognition to assign each person in the group a
color for their speech bubbles.
Transcense requires each speaker to have a phone/device near them and for that device to
be connected to the end users phone via wifi. Hence it doesn’t work in an environment
without internet whereas ReCap does. This gives us a wider usability range as well as more
flexibility as it can function with just one device.
ReCap | A Realtime Captioning Solution for Class | 27
Nuance Dragon: Converts speech to text. Basically used for taking notes, writing emails,
etc. Has been in the market for a lot of time. Is used by iCommunicator which targets the
deaf community by translating speech to sign language.
Our research tells us that a large number of hard of hearing individuals do not know or use
sign language, hence we target a market which is not occupied by iCommunicator. Also
Nuance requires good internet connection whereas ReCap doesn’t.
ReCap | A Realtime Captioning Solution for Class | 28
5 About the Business
5.1 Market Size
We are targeting the people who are hard of hearing and who rely on external devices such as cochlear implant and hearing aids to help them hear better. Since our product caters to one direction speech to text capability, we assume that our product will be most valuable to people who can speak comfortably and can reciprocate but need help in improving the cognizance of the dialogues in their surroundings. Currently, we plan to deploy the product in classrooms for the hard of hearing students to participate better in classroom discussions and to garner the professor’s lecture in a legible manner.
Therefore, our ideal target customer are hard of hearing students whose primary language of communication is English.
Global population with hearing disability is 360 million (5% of the total population). The entire hearing disability population can be segmented into two parts - completely deaf and hard of hearing. It is estimated that about 12% of the world population is completely deaf . 1
Hence, this opens up nearly 88% of the market, nearly 316 million which is the total servable market.
Out of this total addressable market, our target market would be the hard of hearing community who are educated.. We would refer to the l hearing impaired population who have 12+ years of education. It is assumed that this population understands English. Assuming that about 9% of the hard of hearing community is educated: Our target market : 9% of 316.8 million = 28,512,000
1 In 2011, 30 million out of 250 million were completely deaf which is around 12%.
ReCap | A Realtime Captioning Solution for Class | 29
5.2 Go to Market Strategy
Based on our detailed stakeholder analysis(enclosed in appendix ), we realized that the
schools and educational institutions are the maximum influencers of the value chain that
leads to our target market of hard of hearing students.. Hence, our product would be
offered to the students via the deaf schools and the disability centers in regular education
institutions catering to the hard of hearing. This also has an added advantage in that it
allows the verification of the end users and makes sure that the product is reaching the
intended user. This is important since this device can be misused by hearing students
where they could record class lectures and circulate the same.
5.3 Business Model
ReCap | A Realtime Captioning Solution for Class | 30
The revenue model would be a monthly subscription at about $200 a month per user, and
a group discount can be as high as 50% off. Based on our research and study of the
expenses with current technology being used in deaf schools, our service will be disruptive
since it is a superior technology at a cheaper cost.
Our product will be an app that can be downloaded only by hard of hearing students on
their phone/ipad. Additionally, We intend to provide special mics to the professors to
accentuate the speech capturing process. The mic would be a one time buy for the
institutes.
There about 16000 education institutions across the world (assume that half of them have
at least 1 deaf student) and about 90 deaf schools (assume that they will buy the service
for all the students, and average number of students is 100) in the US alone. Assuming a
penetration of about 20% in two years, there will be 3.84 million annually (16000 * 50% *
20% * 200 * 12) from public schools and 2.16 million annually (90 * 20% * 100 * 100 * 12)
from deaf schools. For the one time mic/mic base set purchases, we will have $85600
(16000 * 50% * 20% * 40 + 90 * 20% * 30 * 40) for the first two years.
The detailed business model canvas is enclosed in the appendix.
5.4 Financial
Fixed and variable costs includes rent, utility bills, phone bills/communication costs,
accounting/bookkeeping, legal/insurance/licensing fees, postage, technology, advertising
& marketing, salaries Variable costs mainly are materials and supplies, packaging of the
mic.
ReCap | A Realtime Captioning Solution for Class | 31
Customer demands of ReCap will be stably increasing for the initial period. Once a
customer(school or institute) has started using ReCap, it is hard for them to find a
replacement that will provide them more value.
The profit of first two years will be used as expanding our customer group in the first 2
years.
Profitability of ReCap’s is high. Because of the breakthrough technology ReCap based on,
even though providing a disruptive price, the profit margin of ReCap will still remain high.
ReCap expect to breakeven at in 1 year.
5.5 Funding Requirements
ReCap is seeking Seed funding in the amount of $1.2M for staffing purposes, purchasing
software and hardware computing equipment, office costs, and other Internet related
costs. This funding will also be used as developing the initial version of our products and
pilot test.
The company is also seeking Series A funding in the amount of $3 million for developing
more markets, improving product and building more web service facilities.
ReCap | A Realtime Captioning Solution for Class | 32
6 Project Development Process
We took a top down approach to narrow down to identify the right product market fit. As a first
step, we brainstormed to identify at least 20 applications that could leverage on the speed and
accuracy which were the unique selling points(USP’s) of the speech processing platform. We took
a structured approach to filter out these applications to zero-in on the most lucrative
application that is viable and profitable. Each application was communicated with a use case
after which we analyzed the following information:
1. Is there an actual pain point that we are solving?
2. If yes, is the pain point big enough for the customer to buy our product? In other words,
are we giving the 10x difference?
3. How urgent is the need?
4. What are the market trends driving the industry?
5. Is the market saturated? How is the competitive landscape of the industry?
6. Is there a niche market that we could find in the big market?
7. Is it technically feasible?
Within the scope of the above list of information pointers, the first three were used as the most
important criteria for elimination/advancing with the application idea. The information was
obtained by actual stakeholder interviews relevant to the use case. After validating the
applications against market needs, we were able to eliminate most of the applications and come
down to the three applications which were promising. We used the second level filter of market
and competitor analysis to eliminate two other applications since it belonged to the home
automation space which was already saturated.
At the end of this entire process, we found that there was a pain point faced by the handicapped
particularly by the deaf and hard of hearing in terms of not being able to
communicate/understand conversations effectively since they could not hear properly. This was
indeed a big hindrance for them that renders them unemployable because of the handicap
ReCap | A Realtime Captioning Solution for Class | 33
despite being well educated. We saw the potential use of our speech processing technology
which not only solves the pain point of the consumer but also leverages on the speed and
accuracy of the technology since real time translation requires such attributes. Therefore, an
ideal target-market fit was identified.
Initially, we had envisioned our product to be a device that allows a two way communication
where the deaf gestures into a phone/computer which converts to voice for a hearing person and
vice versa where the speech is converted into sign and appears as an avatar on the screen for the
deaf. However, in order to achieve this product entailed a huge scope in terms of technical
development. As an alternative, we explored possibilities where a one way communication can
be established by using a speech to text conversion.
We used end user inputs at every decision juncture in our commercialization process. For
example, we had assumed that our product provided more value only if it is bidirectional where
both hearing and deaf can communicate effectively. Contrary to our belief, the end user who was
a hard of hearing student from Tepper school of business, CMU seemed to find more value in a
simple speech to text application that would allow her to understand and comprehend
conversations better especially in noisy situations. This was a turning point in our
commercialization process where we had to pivot from catering to the completely deaf by
providing a high end gesture-speech-gesture application to a much simpler speech-text
conversion. This pivot literally changed the target market, the use case and the product design
that we had envisioned earlier resulting in a much finer product-market fit.
Our target market was now identified as the hard of hearing people who are not completely deaf
and who are educated and can speak fluently. Secondly, we realized that this one way
communication of speech to text is most effective in classrooms where the professor speaks in
majority and other interactions our minimal. Hence, our complete study thereof was centered
around the classroom scenario as the use case and the target market as the hard of hearing
students.
ReCap | A Realtime Captioning Solution for Class | 34
7 Vision
7.1 ReCap as An Enterprise
ReCap would be a high tech firm with a core competency in speech processing platforms.
Since our first product caters to the handicapped, our company’s core mission and values
would be to continue helping the handicapped through the power of technology. We
believe that this section of people are often ignored and the power of technology has been
less harnessed in this area. Having said this, we envision ReCap to scale up quickly to a
mid size company with more intelligent product offerings under its gamut.
7.2 Exit Strategy
There is a general trend of the big players such as Microsoft, Google investing in
technologies for the handicapped. For example, Microsoft Research has partnered with a
firm in China to use Microsoft kinect as a means to convert sign to speech and speech to
sign. As a corollary of these activities, we foresee a possible trend of the big players
scouting for companies operating in this niche area either for acquisition or joint venture.
As a exit strategy, we believe that eventually we would come under this radar to be
acquired or have long term partnerships for future endeavours.
ReCap | A Realtime Captioning Solution for Class | 35
Appendices
I Business canvas
II Survey from Western Pennsylvania Deaf School
We had provided survey forms to the deaf school to be filled by the faculty. We would be sending these separately once we have it.
III Stakeholder Analysis
ReCap | A Realtime Captioning Solution for Class | 36
IV Things We Have Done
1. Idea brainstorming:
Top down approach; Came up with 20 ideas
2. Idea evaluation:
Validated the market needs secondary research and talking to end users
3. Narrowed down on deaf device
4. Came up with long term vision:
Two way communication, which is sign to speech and speech to sign
5. Broke down the vision into three phases:
ReCap | A Realtime Captioning Solution for Class | 37
Speech -> Text, Speech <-> Text, Speech <-> Sign language
6. Evaluated technical feasibility
7. Interviewed end user to understand which phase had maximum value
8. Also identified the urgent need in this target market: educated hard of hearing people
9. Did market sizing to understand if market size is big enough
10. Identified use cases and target market to get a niche that had no competitors playing:
Classroom, seminars, conferences
11. Identified actual value to the end user - independence,security, quality transmission:
Our interview with the end user revealed some key aspects desired in a product which
are very valuable to the user. And those were independence, security and quality
transmission
12. Competitor analysis : Transcence, RogerVoice etc.
13. Understood current use of technology; CART, cochlear implant (previous voa)
14. Detailed voa on all three phases
15. Based on insights from above points: We narrowed down to phase I since it had
maximum value
16. Interviewed professors to check if they are ok with such a product in class
17. Stakeholder analysis to understand stakeholders who influence the end user:
parents/friends and educational institutions
18. Customer discovery phase II:
Went to deaf school in Pittsburgh to understand current technology used and challenges
faced
Interviewed official in office of disability to understand CART
19. Identified value for immediate customers; cheaper cost (existing technologies are at very
high cost and not great performance)
20. Developed a go to market strategy via school institutions to mitigate misuse of product;
initial roll out planned in early 2015
21. Product design: came up with sketch
22. First stage industrial design
23. Product cost estimation
24. A complete business model canvas
ReCap | A Realtime Captioning Solution for Class | 38
V. VOA for Phase 2 & 3 Product (previous plan)
1. Introduction of the competitor product
The product we compare to is the sign language translator using microsoft kinect. In late 2013, the microsoft research group released a prototype using microsoft kinect that allows the deaf/hard of hearing to communicate with people outside of their community comfortably using sign. The envisioned product would convert the sign language to speech and text in real time and vice versa thus bridging the communication gap.The prototype uses the kinect as a depth camera to capture the gestures used in sign language. It is then converted into a speech and text for the hearing to understand. On the other hand, if the hearing speaks , the mic captures the speech and converts it into sign language for the deaf. This sign language is shown as an avatar on the screen for the deaf thus allowing seamless communication
2. Introduction of our product Our product, signtovoice, is a device that almost mimics an alongside interpreter for the deaf. With a high end technology that converts sign to speech and vice versa, the deaf can now communicate beyond their usual world in a method that they are most comfortable with- the sign language. THe software for the device can be downloaded as an app or web application on to a phone/tablet/desktop where the screen is used to see the sign and capture the sign. Additionally, a low cost small portable hardware would consist the 3D camera that captures the sign and processes it into speech.The hardware will be ergonomically designed that could be attached to the phone as a case. On the other end, the mic is used to capture the voice of the hearing and converts to sign which will appear on the screen of the portable device that the deaf is carrying. In this way, we are enabling the deaf communicate with the hearing with minimal additional hardware and a low latency translation.
3. VOA
Value Sub-value Product Low Medium High
Emotion
–Sense of adventure Kinect Sign Language Translator
Our product
ReCap | A Realtime Captioning Solution for Class | 39
– Feel of independence
Kinect Sign Language Translator
Our product
– Sense of security Kinect Sign Language Translator
Our product
– Sensuality Kinect Sign Language Translator
Our product
– Confidence Kinect Sign Language Translator
Our product
– Power Kinect Sign Language Translator
Our product
Aesthetics
– Visual Kinect Sign Language Translator
Our product
– Tactile Kinect Sign Language Translator
NA
Our product NA
– Auditory Kinect Sign Language Translator
Our product
– Olfactory Kinect Sign Language Translator
NA
Our product NA
– Gustatory Kinect Sign Language Translator
NA
Our product NA
Impact
– Social Kinect Sign Language Translator
Our product
ReCap | A Realtime Captioning Solution for Class | 40
– Environmental Kinect Sign Language Translator
Our product
Identity
– Personality Kinect Sign Language Translator
Our product
– Point in time Kinect Sign Language Translator
Our product
– Sense of place Kinect Sign Language Translator
Our product
Ergonomics
– Ease of use Kinect Sign Language Translator
Our product
– Safety Kinect Sign Language Translator
NA
Our product NA
– Comfort Kinect Sign Language Translator
Our product
Core Technology
– Reliable Kinect Sign Language Translator
Our product
– Enabling Kinect Sign Language Translator
Our product
Quality
Craftsmanship Kinect Sign Language Translator
Our product
ReCap | A Realtime Captioning Solution for Class | 41
Durability Kinect Sign Language Translator
Our product
4. Explanation
Emotion
- Sense of adventure Microsoft Kinect and our technology both offer a similar sense of adventure as the primary functions of both the products are very similar, hence, both rank medium for sense of adventure.
- Feel of Independence Microsoft Kinect uses more hardware compared to our technology, so the overall feel of independence of our technology is more than that of Microsoft Kinect.
- Sense of Security Microsoft Kinect and our product induces an almost identical sense of security in the hearing impaired individual.
- Confidence Microsoft Kinect and our technology give high levels of confidence to the hearing impaired individual to communicate with the world. Both products make the user feel self-assured in a social scenario.
- Power Microsoft Kinect ranks higher on power than our technology because as microsoft is an internationally recognized brand, it makes the user feel more powerful. Aesthetics
- Visual
ReCap | A Realtime Captioning Solution for Class | 42
Our product ranks higher than Microsoft Kinect from the visual aesthetics point of view, because it uses everyday electronic gadgets for inputs whereas Microsoft Kinect uses additional hardware.
- Auditory Our Technology and Microsoft Kinect rank moderate from the auditory aesthetics perspective. Both technologies have a AI software for converting text/sign language to speech and hence sound similar. Impact
- Social Our technology has a moderate social impact as we try and change the way the deaf community communicates with the world. Microsoft Kinect on the other hand ranks high on social impact because Microsoft being a globally recognized brand can impact the community on a greater scale with the amount of resources it possesses.
- Environment Both Microsoft Kinect and our product have low environmental impact as both products are mostly software based and hence do not require extensive hardware and manufacturing processes. Identity
- Personality Using Microsoft Kinect Sign Language Translator and using our product are similar. But due to the brand image of Microsoft and Kinect, the former choice gives the user a better personality.
- Point in time Both of them are good but not perfect in terms of timing. The technology is still developing, so the speed and accuracy are not good enough yet.
- Sense of place
ReCap | A Realtime Captioning Solution for Class | 43
Kinect Sign Language Translator requires more hardware, which means it’s more cumbersome. Our product uses image processing and gesture recognition, so it’s more portable and more suitable in many places. Ergonomics
- Easy to use Both products are translating sign language without wearing rings, gloves or other gadgets, so it is quite easy to use. Given that Microsoft needs Kinect to recognize gestures, we consider Kinect Sign Language Translator is a little more difficult to use than our product.
- Safety Safety are not relevant to these two products.
- Comfort Kinect Sign Translator and our product are comfortable to use because people can use them naturally without the restraints from other gadgets like rings or gloves. Core technology
- Reliable Both products are high in technology reliable. Our technology is provided by Carnegie Mellon University which is a lead in this area. Kinect Sign Translator has a powerful support from the technology from Microsoft, which might be even stronger than ours.
- Enabling As mentioned above, both products has strong technology support, and Kinect might be even stronger than ours. Quality
ReCap | A Realtime Captioning Solution for Class | 44
- Craftsmanship
Kinect Sign Translator and our product are both have high craftsmanship. Kinect has the manufacture support of Microsoft to produce high craftsmanship. Our product’s structure is simple to build, so finding a high craftsmanship supplier could be easy.
- Durability Our product has medium durability compared to Kinect Sign Translator’s low. Kinect Sign Translator high depends on the sensor of Kinect Devices and is not portable, which reduces the durability of the product, while our product is portable and easy-replaceable.
5. Conclusion From the value opportunity analyse, we can see that phase 2&3 our product has 5 items that has advantages compared to Microsoft Kinect, which are feel of independence, visual, sense of place, ease of use and durability. However our product has shortages in 3 items, which are power, social and personality. This shows that we might have a strong competition with Microsoft in the market, and in some hardly supercede Microsoft. So we decided to give up phase 2 & 3 and develop only phase 1 of our product.
ReCap | A Realtime Captioning Solution for Class | 45
References
1. http://www.who.int/mediacentre/factsheets/fs300/en/ 2. http://www.gallaudet.edu/clerc_center/information_and_resources/info_to_go/resources/
websites_of_schools_and_programs_for_deaf_students_.html 3. Hands up for help! Giving deaf children a fair chance at school 2010 4. http://nces.ed.gov/pubs94/94394.pdf 5. http://www.academia.edu/1542352/Amazon_Closed_Captioning_Formal_Report 6. http://www.cs.rochester.edu/~sadilek/publications/Real-Time_Captioning_by_Groups_of_
Non-Experts_UIST-12.pdf 7. http://www.start-american-sign-language.com/dianrez.html 8. http://www.phonak.com/com/b2c/en/products.html 9. http://www.necc.mass.edu/academics/support-services/learning-accommodations/deaf-a
nd-hard-of-hearing-services/student-resources/accommodations-tipsheets/communication-access-realtime-translation/
10. https://www.youtube.com/watch?v=qn4B0gyDosA 11. https://graysdeafblog.wordpress.com/about/ 12. http://misskatsmom.blogspot.com/ 13. http://becomingdeaf.com/ 14. http://lizsdeafblog.blogspot.com/p/contact-me.html 15. http://campustechnology.com/articles/2014/10/06/ga-tech-google-glass-app-does-captio
ning.aspx
ReCap | A Realtime Captioning Solution for Class | 46