non-repudiation in internet telephony.opendl.ifip-tc6.org/db/conf/sec/sec2007/kuntzesh07.pdf ·...

Non-Repudiation in Internet Telephony

Nicolai Kuntze1, Andreas U. Schmidt1, and Christian Hett2

1 Fraunhofer–Institute for Secure Information Technology SITRheinstraße 75, 64295 Darmstadt, Germany

{andreas.u.schmidt,nicolai.kuntze}@sit.fraunhofer.de2 ARTEC Computer GmbH

Robert-Bosch Straße 38, 61184 Karben, [email protected]

Abstract. We present a concept to achieve non-repudiation for naturallanguage conversations over the Internet. The method rests on chainedelectronic signatures applied to pieces of packet-based, digital, voicecommunication. It establishes the integrity and authenticity of the bidi-rectional data stream and its temporal sequence and thus the securitycontext of a conversation. The concept is close to the protocols for Voiceover the Internet (VoIP), provides a high level of inherent security, andextends naturally to multilateral non-repudiation, e.g., for conferences.Signatures over conversations can become true declarations of will inanalogy to electronically signed, digital documents. This enables bind-ing verbal contracts, in principle between unacquainted speakers, andin particular without witnesses. A reference implementation of a secureVoIP archive is exists.

1 Introduction

The latest successful example for the ever ongoing convergence of informationtechnologies is Internet based telephony, transporting voice over the Internetprotocol (VoIP). Analysts estimate a rate of growth in a range of 20% to 45% an-nually, expecting that VoIP will carry more than fifty percent of business voicetraffic (UK) in a few years [1]. The success of VoIP will not be limited to cablenetworks, convergent speech and data transmission will affect next generationmobile networks as well. The new technology raises some security issues. Foreavesdropping traditional, switched analogue or digital phone calls, an attackerneeds physical access to the transport medium. Digital networks are generallymore amenable to attacks, as holds already for ISDN and to a yet greater extentfor IP networks. Efforts to add security features to VoIP products are gener-ally insufficient, though proposals exist for the protection of confidentiality andprivacy. Secure VoIP protocols, using cryptographic protection of a call, wouldeven be at an advantage compared to traditional telephony systems. Protocolslike SRTP [2] can provide end-to-end security to phone calls, independently ofthe security of transport medium and communication provider.

With VoIP maturing, it becomes natural to ask for application-level securityin the context of IP telephony. Our purpose is to achieve non-repudiation in thiscontext, i.e., for speech over packet-oriented, digital channels, and in particularfor VoIP conversations. This means the capability to produce tenable evidence

2 Nicolai Kuntze, Andreas U. Schmidt, and Christian Hett

that a conversation with the alleged contents has taken place between two ormore parties. Ancillary information, e.g., that the conversation partners havedesignated, personal identities, and the time at which the conversation hastaken place, may be of utmost importance in this regard, either to establisha supporting plausibility, e.g., ‘caller was not absent during the alleged call’,or to express relevant semantic information, e.g., ‘telephonic order came inbefore stock price rose’. For electronic documents this kind of non-repudiationis commonly achieved by applying electronic signatures based on asymmetriccryptography. In the communication between several parties, the desired resultis a binding contract, and in analogy the central goal of the present contributionis a technology to establish binding verbal contracts without witnesses.

This subject has a long pre-history: As early as 1905, Edison proposed therecording of voice, which was patented 1911 [3]. With the advent of digitalsignature technology, Merkle [4] envisioned, referring to Diffie and Hellman that“Digital signatures promise to revolutionize business by phone”. However, workon non-repudiation of digital voice communication is scarce. The work mostclosely related to ours is the proposal in [5], resting on the theory of contractsand multi-lateral security [6]. It comprises a trusted third party (‘Tele-Witness’)that is invoked by communicating parties to securely record conversations andmake them available as evidence at any later point in time.

Non-repudiation of inter-personal communication is interesting because ofits inherent evidentiary value, exposed by forensic evaluation of the containedbiometric data, e.g., as an independent means of speaker identification [7, 8].Methods for the latter are advanced [9], yielding to recorded voice a high pro-bative force, e.g., in a court of law. In comparison to other media, specificfeatures of voice contribute to non-repudiation. Voice communication is inter-active [10] and enables partners to make further enquiries in case of insufficientunderstanding. This mitigates to some extent problems to which signed digitaldocuments are prone, e.g., misinterpretations due to misrepresentation, lack ofuniqueness of presentation, and inadvertent or malicious hiding of content [11].

We set out requirements for non-repudiation which are very particular in thecase of VoIP and other multi-media communication over IP, in Section 2 andpropose the method to meet them in Section 3. Section 4 analyses the securityof the method by listing and assessing the auditable information secured by it.A section which describes the implementation of a secure VoIP archive could,unfortunately, not be included for space restrictions. It may be found in [27]and at http://arxiv.org/abs/cs.CR/0701145. Conclusions and an outlookare found in Section 5.

2 Requirements for non-repudiation of conversations

From the schematic characterisation of non-repudiation in the standards [12,13], we focus on the secure creation of evidence for later forensic inspection.This overlaps with the basic information security targets integrity and avail-ability of the well-known CIA triad. To account for the particularities of thechannel, we here take a communication-theoretical approach to derive require-

362 Andreas Schmidt, Nicolai Kuntze, and Christian Hett

Non-Repudiation in Internet Telephony 3

ments for non-repudiation. The general characteristics of the class of electroniccommunication that we address are the same for a wide media range, comprisingaudio, video, and multi-media. In essence it is always a full duplex or multiplexchannel operating in real time using data packets, and we subsume communi-cation over those under the term conversation. Generic requirements for thenon-repudiation of conversations can be profiled for specific media, and wesometimes exemplary allude to the case of speech and VoIP. They are groupedaround the top level protection targets congruence and cohesion. We describethe latter and devise for each a minimal set of specific, but application- andtechnology-neutral requirements. The requirements are necessary preconditionsto achieve the protection targets, and are ordered by ascending complexity.

T1 Congruence. Communication theory and linguistics have establishedthat the attributions of meanings can vary between a sender and a receiver ofa message [14, Chapter 6], [15] — a basic problem for non-repudiation. Apartfrom the ambiguity of language, this implies particular problems for electroniccommunication channels and media. For digital documents bearing electronicsignatures, the presentation problem is addressed by invoking the ‘What YouSee is What You Sign’ (WYSIWYS [11]) principle. It is often tacitly assumedthat presentation environments can be brought into agreement for sender andreceiver of a signed document [16]. We term this fundamental target ‘congru-

ence’. It has special traits in the case of telephony. Essential for non-repudiationis the receiver’s understanding, which leads in analogy to the principle ‘What IsHeard Is What Is Signed’. But additionally it is indispensable to assure senders(speakers) about what precisely was received (heard).

R1.1 Integrity of the data in transmission, including technical environ-ments for sending and receiving them. For VoIP, this is to be addressed at thelevel of single RTP packets and their payloads and of an entire conversation.

R1.2 Treatment of losses in the channel must enable information ofsenders about actually received information. This is independent of methods foravoidance or compensation of losses, such as Packet Loss Concealment (PLC).Rather it means a secure detection of losses (enabled by fulfilled R1.1), enablinga proper handling on the application level as well as a later (forensic) inspection.

R1.3 User interaction policies and their enforcement finally use fulfilledR1.1 and 1.2 to ensure congruence in the inter-personal conversation. For elec-tronic documents this can simply amount to prescriptions about the technicalenvironments in which a electronically signed document must be displayed. Orit can be an involved scheme to guarantee the agreement of contents of docu-ments undergoing complex transformations [17, 18], e.g., between data formats.For speech, it can be realised in various ways taking into account the interactivenature of the medium. This is elaborated on in Section 3.5.

T2 Cohesion regards the temporal dimension of conversations. It means inparticular the protection and preservation of the sequence the information flowsin all directions of the channel. Again this is at variance with signed documents,where temporal sequence of communication is immaterial. Cohesion means to



establish a complete temporal context of a conversation usually even in absolute

time, since the temporal reference frame of a conversation can be meaningful.R2.1 Start times of conversations must be determined and recorded. This

is analogous to the signing time of documents (the assignment of which is arequirement for qualified signatures according to the EU Signature Directive).

R2.2 Temporal sequencing of conversations must be protected and re-lated to the reference time frame established by fulfilling R2.1.

R2.3 Continual authentication of communication devices and if possibleeven communication partners is necessary, e.g., to prevent hijacking.

R2.4 Determined break points must allow for non-repudiation of con-versations until they are terminated intentionally or inadvertently.

From the requirements analysis it is apparent that congruence and cohesionare complementary but not orthogonal categories. A specific profile for VoIP isnot formulated here for brevity, but rather included in the development of themethod below. It is understood that additionally the known standard require-ments for electronic signatures as declarations of will and for non-repudiationof electronically signed documents, which are rooted in the theory of multi-lateral security [19], must be taken into account. We do not address details ofuser authentication, consent to recording, general privacy, confidentiality, andinteraction with respect to the signing as a declaration of will proper. Nonethe-less, the method proposed below enables the secure recording and archiving topreserve the probative value of a conversation, as demonstrated in [27].

3 The method

The requirements (R2.4) entail that signing a entire conversation with a singleRSA signature by A is not viable, since this yields full disposal to determine(maliciously) the end time of signing of a conversation, and deprives B of anypossibility to control and verify this during conversation. The opposite approachto secure single packets does not assure cohesion (R2.2 in conjunction withR1.1), since single RTP-packets contain only little audio data which may theneasily be reordered. Apart from that, it would be computationally expensive.This is the prime motivation for the method we now present in general for thecase of a bilateral conversation between A and B, using, e.g., the SIP/RTPprotocol combination [20, 21]. In a basic model A secures the conversationas an unilateral declaration of will. We proceed in a bottom-up fashion fromthe base concept of intervals of VoIP data, over securing their integrity by acryptographic chain, to coping with inevitable packet loss. For later reference wecall the technique presented in 3.1— 3.4 below the interval-chaining method.

3.1 Building intervals

Intervals are the logical units on which the protection method operates. In-tervals span certain amounts, which may be nil, of RTP packets for only onedirection. As bi-directional communication needs formation of intervals for bothdirections, A and B hold buffers for packets both sent and received. Since di-rections are handled differently w.r.t. packet loss, as described in Section 3.3,



directionally homogeneous intervals are advantageous from a protocol designviewpoint. To resolve the full duplex audio stream into an interval sequencewe determine that intervals in the directions from and to A alternate. Intervalsare enumerated as I2k−1, I2k, k = 1, . . . , N for directions A → B and B → A,respectively. Interval Il comprises RTP packets (pl,j), j = 1, . . . ,Kl, sent orreceived by A. For the moment we assume that there is no packet loss.

The length of an interval (in appropriate units) is a main adjustable param-eter, and an important degree of freedom. Adjustable sizes of, e.g., data framesare not very common in communication technology, but recent proposals [23]show that they can be advantageous in certain situations, like the present one.We determine that interval boundaries are triggered by the elapse of a certaintime, called interval duration and denoted by D. If T is the duration of the con-versation then N = ⌈T/D⌉. Basing intervals on time necessitates the formationof intervals without voice data payload when a silence period exceeds D. Thisdesign choice entails some signalling, transport, and cryptographic overhead.This is however outweighed by some favourable properties. In particular, themaximum buffer length is known from the outset, and control of the intervalduration is a direct means to cope with the (known) slowness of (public key)cryptographic soft- and hardware. Adjustment of D therefore allows for an,even dynamical, trade-off between security and performance, as it controls theratio of security data to payload data. The alternative of triggering intervals byfull-run of packet buffers at both sides causes concurrency problems.

Since the communication channel is fully duplex, the sequence of intervalsdoes not reflect the temporal sequence of audio data, rather I2k−1 and I2k

comprise approximately concurrent data sent in both directions. But this isimmaterial since intervals are only logical units and security data for intervalscan be stored separately from the RTP streams. This is a key feature of ourmethod. It does not affect the VoIP communication at all but can be run incomplete — logical and even physical (extra hardware) — separation from it.VoIP communication is therefore not impeded by our method.

3.2 Cryptographic chaining

The basic idea is to cryptographically secure the payload contained in eachinterval and include the generated security data in the subsequent interval toform a cryptographic chain. We use the shorthand (·)X

def

= PrivX(h(·)) for entityX’ digital signature by applying a private key PrivX and a hash algorithm h(·).TS is a time-stamping authority. The notation −→ signifies the sending of somedata. To sign a conversation A performs the following operations.

SecI : MIdef

= (D,SIP_Data,Auth_Data, nonce, . . .) −→ B;

S0def

=((MI)A

)TS

−→ B;

Secl : Sldef

= (Il, Sl−1)A −→ B; l = 1, . . . , 2N

SecF : MFdef

= (termination_condition, . . .) −→ B;

SFdef

=((MF , S2N )A

)TS

−→ B;



In the initial step SecI , (·)TS means a time-stamp applied by TS, e.g., accordingto RFC 3161 [22], and is enveloping the meta-data MI signed by A (R2.1). Thismay include some authentication data Auth_Data , e.g., A’s digital certificates.To provide a broad audit trail for later inspection, data from the call nego-tiation and connection establishment, here subsumed under SIP_Data, shouldbe included. The final time-stamp can be used optionally to detect drift, andnarrows down the conversation in time. Since this is sufficient to secure thetemporal context required for cohesion, the application of time-stamps in everystep, which may be costly, is not proposed. A nonce is included in MI to pre-vent replay attacks. By including Sl−1 in the signed data Sl and S2N−1 in SF ,and alternation of interval directions, R1.1 and R2.2 are satisfied. Signaturesof A and additional authentication data in MI support R2.3. If communicationbreaks inadvertently, interval chaining is verifiable up to the last interval, thusR2.4 is satisfied, with a loss of at most one interval duration of conversationat its end. A controls interval timing and the operations SecI , Secl, and SecF

occur at times 0, ⌊l/2⌋ · D, and N · D, respectively.

3.3 Treatment of packet loss

Digital voice communication offers a rather high reliability leading generally toa higher understandability of VoIP communication in comparison with all pre-decessors. However, packet loss may occur and must be treated as explained inR1.2. Denote by δl ⊂ {1, . . . ,Kl} the sequence of identifiers of packets actuallyreceived by A respectively B. Intervals are reduced accordingly to I ′l

def

=(pl,j)j∈δl.

The steps Secl are modified by a protocol to report received packages.

Sec′2k−1 : repeat

repeat

interval_termination −→ B;

until δ2k−1 −→ A;

until S2k−1def

= (I ′2k−1, S2k−2)A −→ B;

Sec′2k : repeat

S2kdef

= (I ′2k, S2k−1)A −→ B;

δ2k −→ B;

until success;

This accounts for losses in the VoIP (RTP) channel as well as failures in thechannel for transmission of signing data. The loop conditions can be evaluatedby explicit (Sec′2k) or implicit (Sec′2k−1) acknowledgements by receivers.

3.4 Extension to multilateral conversations

Here we present the simplest way to extend the method above to conference-like situations. Multilateral non-repudiation means mutual agreement aboutthe contents of a conversation between all parties. For implementing it for Mparticipants A0, . . . , AM−1 a round-robin scheme [24] can be used to producethe required chain of signatures as in Lemma 1. Round-robin is a simple algo-rithm to distribute the required security data between the participants of the



conference. Other base algorithms of distributed systems like flooding, echo, orbroadcast might be used, depending, for instance, on the particular topology ofthe conference network. During the round, a token is passed from participantto participant, signalling the signer role. If participant Am carries the token, hewaits for time D and buffers packets sent by himself. When Am terminates theinterval a signalling and signing protocol is processed, which, in contrast to thescheme above, only concerns data sent by Am. The numbering of intervals is asfollows. In the time span from 0 to D the packets (pm;j) sent by Am are in theinterval Im. The packets emitted by Am during [D, 2D] are in IM+m, and soon. It is here not feasible to sign merely the packets received by everyone, be-cause cumulative packet loss could be too high. Instead, an additional hashingindirection is included and hashes Hθ

k

def

= (h(pk;j))j∈θ of all packets θ receivedby at least one person from Am in interval k are distributed and can be usedto check the signature in spite of packet loss. Let δσ

k denote the list of packetssent by Am and received by Aσ in interval k. Set Rm

def

= {0, ..,M − 1}\m andlet r ≥ 0 be the round number. In order to account for latencies in reporting ofpacket loss, computing hashes, and signing, we introduce a parallel offset in theround-robin scheme. In round r participant Am carrying the token terminatesinterval with number k(r, m)

def

= rM2 + (M + 1)m + 1. He secures the set of

intervals I(r, m)def

=(k(r, m) − M · {0, . . . ,M − 1}

)∩ N.

Sec_multr,m : ∀σ ∈ Rm do

repeat

interval_termination −→ Aσ;

until (δσk )

k∈I(r,m) −→ Am;

od;

θkdef

= ∪σ∈Rmδσk for k ∈ I(r, m);

Dr,mdef

=((δσ

k )σ∈Rm,Hθk

k

)k∈I(r,m)

;

Sr,mdef

= (Dr,m, Spred(r,m))Am;

∀σ′ ∈ Rm do

repeat

(Sr,m, Dr,m) −→ Aσ′ ;

until success;

od;

The preceding security value Spred(r,m) bears indices

pred(r, m) =

⎧⎪⎨

⎪⎩

(r, m − 1) if m ≥ 1;

(r − 1,M − 1) if r ≥ 1, m = 0;

I otherwise,

where I stands for the initialisation interval which can be constructed as inthe preceding sections, replacing single sending by broadcast with acknowl-edgements. The numbering scheme for Intervals and the evolving sequence of



D 2D 3D 4D 5D 6D 7D 8D 9D

A0 1 5 9 13 17 21 25 29 33

A1 2 6 10 14 18 22 26 30 34

A2 3 7 11 15 19 23 27 31 35

A3 4 8 12 16 20 24 28 32 36

Fig. 1. Numbering of intervals in the case of 4 participants along the time axis. Arrowsindicate the sequence of security values S. Thicker borders separate rounds. Equallycoloured intervals are secured in a single operation Sec_mult

r,m.

security values is shown in Figure 1 below. In effect, Am broadcasts (with ac-knowledgement) a signature over hashes of all packets received by at least oneother participant. This is the common security data with which the chain canbe continued. According to Lemma 1, non-repudiation of the total, multilateralconversation for the first interval duration from time 0 to D is achieved afterexecution of Sec_mult2,M−1 at time 2M · D. With each further execution ofSec_mult a subsequent piece of conversation of length D obtains multilateralnon-repudiation.

In case of call termination, 2M +1 finalisation steps without audio data (twofinal rounds plus finishing by the participant carrying the token at the time oftermination) are required to obtain non-repudiation of the last interval in time.Joining and leaving a signed multilateral call while the signature is created bythe participants can be enabled through finalisation. If participant B requeststo join the call, Am, who posses the token, initiates a finalisation and B canjoin after this (inserted as m + 1). In the case that a participant likes to leavehe awaits the token and finalises including a leave message.

3.5 Operational policies

We do not lay out a complete set of rules for the operation of a system usingthe non-repudiation method above. Rather we list the most obvious ones andstress the most important point of monitoring and treatment of packet loss, orrather understandability.

To account for requirement R1.3, users should be signalled at any timeduring a conversation about the signature status of it. This necessitates to anextent specified by application-specific policies the cryptographic verificationof the interval chaining, and continual evaluation of relevant information, seeSection 4.1. Additionally a secure voice signing terminal should control everyaspect of user interaction and data transmission. This is elucidated in [25].

To maintain congruence and mitigate attacks aiming at mutilating a conver-sation, packet loss and the ensuing level of understandability must permanentlybe monitored. When the packet loss is above a configurable threshold, an actionshould be triggered according to determined policies. The principle possibilitiesare: 1. ignore; 2. notify users while continuing signing; 3. abort the signing; and4. terminate call. The first two options open the path for attacks. Terminationof the call is the option for maximum security. From a practical viewpoint, theloss threshold is seldom reached without breakdown of the connection anywaydue to insufficient understandability or timeouts.



Options 3 and 4 provide a ‘Sollbruchstelle’ (predetermined break point) forthe probative value of the conversation. In contrast, most other schemes forsecuring the integrity of streamed data, e.g., the signing method of [26] aim atloss-tolerance, for instance allowing for the verification of the stream signaturewith some probability in the presence of packet loss. We suggest that for theprobative value of conversations, the former is advantageous. A signed call withan intermediate gap can give rise to speculations over alternatives to fill it, whichare restricted by syntax and grammar, but can lead to different semantics. Usingthis, a clever and manipulative attacker could delete parts of the communicationto claim with certain credibility that the remnants have another meaning thanintended by the communication partner(s). If the contents of a conversationafter such an intentional deletion are unverifiable and thus cannot be used toprove anything, this kind of attack is effectively impeded.

4 Security considerations

We corroborate the statement that interval chaining can achieve non-repudiationfor VoIP conversations, based on the information generally secured by intervalchaining. An analysis based on an instance of a system architecture and possibleattacks is contained in [27].

4.1 Auditable information

In this section we analyse the information that can be gained and proved tohave integrity in a call secured by interval chaining. Table 1 gives a, perhapsincomplete, overview over this audit data, which may be amenable to foren-sic inspection, e.g., by an expert witness in court, or, on the other extreme,applicable during the ongoing conversation, or both.

4.2 Comparison with SRTP and IPsec

The well-known security methods SRTP and IPsec address the protection ofconfidentiality, authenticity and data integrity on the application, respectivelynetwork layer, and can be applied to VoIP and as well in parallel with interval-chaining. We want to show salient features of interval-chaining, which distin-guishes it from both standards and in our view provides a higher level of non-repudiation and even practicality. On the fundamental level, both SRTP andIPsec necessarily operate on the packet level and do not by themselves pro-vide protection of the temporal sequence and cohesion of a VoIP conversation.While it is true that pertinent information can be reconstructed from the RTPsequence numbers, in turn protected by hash values, such an approach wouldhave some weaknesses, which taken together do not allow full non-repudiation.In particular, RTP sequence numbers can suffer from roll-overs and though theirintegrity is secured in transmission, they can still be rather easily be forged bythe sender, since they belong to protocol stacks which are not especially se-cured in common systems. While packet loss can be detected or reconstructedusing sequence numbers, interval chaining yields a well-defined, tunable, and

cryptographically secured means to deal with it during an ongoing conversation,



Auditable item Req. Protectiontarget

Verifies/indicates When ap-plicable

Initial time stamp 2.1 Cohesion Start time Always

Initial signature & certificate 2.3 Cohesion Identity of signer Always

Interval Chaining 2.2, 1.1 Cohesion Interval integr. & order Always

Packet loss in intervals 1.2, 2.4 Congruence QoS, understandability Always

Monotonic increase of RTP-sequence numbers

1.1, 2.2 Integrity &cohesion

RTP-stream plausibility Always

Relative drift of RTP-timemarks against system time

2.2 Cohesion RTP-stream plausibility Duringconvers.

Relative drift of RTP-timemarks against ⌊l/2⌋ · D

2.2 Cohesion Packet & stream plausi-bility

Ex post

No overlaps of RTP-timemarks at interval boundaries

2.2 Cohesion Packet & stream plausi-bility

Always

Replay-window Integrity Uniqueness of recordedaudio stream

Always

Final time stamp 2.2 Cohesion Conversation duration Ex post

Forensic analysis of recordedconversation

(Semantic)authenticity

Speaker identity, mood,lying, stress, etc.

Ex post,forensic

Table 1. Auditable information of a conversation secured with interval chaining.Columns: Secured data item audited, Non-repudiation requirement addressed, Pro-tection target supported, Actual information indicated or verified, and when is thecheck applicable.

significantly limiting potential attack vectors. In essence, RTP sequence num-bers are not designed to ensure a conversation’s integrity and thus have lowerevidentiary value in comparison to chained intervals. From the viewpoint ofelectronic signatures, their level of message, respectively, conversation authen-tication can only be achieved with an protocol-independent means to manageauthentication data such as asymmetric keys, i.e., a Public Key Infrastructure.The connection and session dependent key handling of IPsec and SRTP, relyingon HMACs and merely allowing for symmetric keys deprived of authenticationsemantics, are generally insufficient for non-repudiation. Interval chaining is anindependent means to control the cryptographic workload benefiting scalability.Finally, NAT traversal is a problem for network layer integrity protection likeIPsec since rewriting IP headers invalidates corresponding hash values (a solu-tion has been proposed by TISPAN [28]). This problem does not occur with theinterval chaining method, since only RTP headers, not IP headers of packetsneed to be (and are in the implementation below) signed.

5 Conclusions

IP-based multimedia communication is not restricted to VoIP, for instance bynow, several video conferencing systems are maturing, some of which are basedon sophisticated peer-to-peer communication [29]. Moreover the service qualityand availability of the new communication channels is constantly increasingthrough developments like packet loss concealment (PLC) for audio [30] and



even video [31] streams. Our proposed method for non-repudiation is applicablein all these contexts. Its adoption would pave the way for a new paradigm fortrustworthy, inter-personal communication. An efficient self-signed archive forVoIP calls and its system architecture was implemented as a prototype togetherwith a verification and playback tool. It requires no modification to the terminalequipment, and secures the ongoing conversations ‘on the fly’ [27].

The next steps in our research are to i) implement the operational con-text for electronic signatures over speech, i.e., user interaction and signalling,ii) devise a trustworthy signature terminal for that purpose, preferably usingTrusted Computing technology on mobile devices, e.g., to secure audio I/O andprocessing, iii) extend the method to conferences and other media than VoIP.

References

1. Kavanagh, J.: Voice over IP special report: From dial to click.http://www.computerweekly.com/Articles/2006/02/14/214129/

VoiceoverIPspecialreportFromdialtoclick.html, visited 1.3.2006.2. Baugher, M., et al.: The Secure Real-time Transport Protocol (SRTP). RFC 3711,

IETF, March 2004. http://www.ietf.org/rfc/rfc3711.txt3. Edison, T.A.: Recording-telephone. United States Patent P.No.:1,012,250, United

States Patent Office, Washington, DC (1911) Patented Dec. 19, 1911.4. Merkle, R.C.: A certified digital signature. In Brassard, G., ed.: Advances in

Cryptology (CRYPTO ’89). Number 435 in LNCS, Springer-Verlag (1989) 218–238 Republication of the 1979 original.

5. Strasser, M.: Möglichkeiten zur Gestaltung verbindlicher Telekooperation. Mas-ter’s thesis, Universität Freiburg, Institut für Informatik und Gesellschaft (2001)

6. Kabatnik, M., Keck, D.O., M. Kreutzer, A.Z.: Multilateral security in intelligentnetworks. In: Proceedings of the IEEE Intelligent Network Workshop. (2000)59–65

7. Poh, N., Bengio, S.: Noise-Robust Multi-Stream Fusion for Text-IndependentSpeaker Authentication. In: Proceedings of The Speaker and Language Recogni-tion Workshop (Odyssey). (2004)

8. Rodriguez-Linares, L., Garcia-Mateo, C.: Application of fusion techniques tospeaker authentication over IP networks. IEEE Proceedings-Vision Image andSignal Processing 150 (2003) 377–382

9. Hollien, H.: Forensic Voice Identification. Academic Press, London (2001)10. Goodwin, C.: Conversational organization: Interaction between speakers and hear-

ers. Academic Press, New York (1981)11. Landrock, P., Pedersen, T.: WYSIWYS? What You See Is What You Sign?

Information Security Technical Report, 3 (1998) 55–6112. ISO: Information Technology: Security Frameworks for Open Systems: Non-

Repudiation Framework. Technical Report ISO10181-4, ISO (1997)13. ISO: Information Technology: Security Techniques - Non Repudiation - Part 1:

General. Technical Report ISO13888-1, ISO (1997)14. Searle, J.R.: Mind, Language and Society. Basic Books, New York (1999)15. Austin, J.L.: How to Do Things with Words. Harvard University Press, Cam-

bridge, Mass. (1962)16. Schmidt, A.U.: Signiertes XML und das Präsentationsproblem. Datenschutz und

Datensicherheit 24 (2000) 153–158



17. Schmidt, A.U., Loebl, Z.: Legal security for transformations of signed documents:Fundamental concepts. In Chadwick, D., Zhao, G., eds.: EuroPKI 2005. Volume3545 of Lecture Notes in Computer Science., Springer-Verlag (2005) 255–270

18. Piechalski, J., Schmidt, A.U.: Authorised translations of electronic documents.In Venter, H.S., Eloff, J.H.P., Labuschagne, L., Eloff, M.M., eds.: Proceedings ofthe ISSA 2006 From Insight to Foresight Conference, Information Security SouthAfrica (ISSA) (2006)

19. Rannenberg, K., Pfitzmann, A., Müller, G.: IT Security and Multilateral Security.In Müller, G., Rannenberg, K., eds.: Multilateral Security in Communications.Volume 3 of Technology, Infrastructure, Economy., Addison-Wesley (1999) 21–29

20. Rosenberg, J., et al.: SIP: Session Initiation Protocol. RFC 3261, IETF, June2002. http://www.ietf.org/rfc/rfc3261.txt

21. Schulzrinne, H., et al.: RTP: A Transport Protocol for Real-Time Applications.RFC 1889, IETF, January 1996. http://www.ietf.org/rfc/rfc1889.txt

22. Adams, C., et al.: Internet X.509 Public Key Infrastructure Time-Stamp Protocol(TSP). RFC 3161, IETF, August 2001. http://www.ietf.org/rfc/rfc3161.txt

23. Choi, E.C., Huh, J.D., Kim, K.S., Cho, M.H.: Frame-size adaptive MAC protocolin high-rate wireless personal area networks. ETRI Journal 28 (2006) 660–663

24. Shreedhar, M., Varghese, G.: Efficient fair queuing using deficit round-robin.IEEE/ACM Transactions on Networking 4 (1996) 375–385

25. Hett, Ch., Kuntze, N., Schmidt, A. U.: Security and non repudiation of Voice-over-IP conversations. To appear in: Proceedings of the Wireless World ResearchForum (WWRF17), Heidelberg, Germany, 15-17 November 2006.

26. Perrig, A., Tygar, J.D., Song, D., Canetti, R.: Efficient authentication and signingof multicast streams over lossy channels. In: SP ’00: Proceedings of the 2000 IEEESymposium on Security and Privacy, Washington, DC, USA, IEEE ComputerSociety (2000) 56–75

27. Hett, C., Kuntze, N., Schmidt, A.U.: A secure archive for Voice-over-IP conversa-tions. In et al., D.S., ed.: To appear in the Proceedings of the 3rd Annual VoIP Se-curity Workshop (VSW06), ACM (2006) http://arxiv.org/abs/cs.CR/0606032

28. Telecoms & Internet converged Services & Protocols for Advanced Networks(TISPAN) http://www.tispan.org/, see also the Whitepaper http://www.

newport-networks.com/cust-docs/91-IPSec-and-VoIP.pdf

29. Zühlke, M., König, H.: A signaling protocol for small closed dynamic multi-peergroups. In: Proceedings of High Speed Networks and Multimedia Communica-tions, 7th IEEE International Conference (HSNMC 2004), Toulouse, France. Vol-ume 3079 of LNCS., Springer-Verlag (2004) 973–984

30. Perkins, C., Hodson, O., Hardman, V.: A survey of packet loss recovery techniquesfor streaming audio. IEEE Network 12 (1998) 40–48

31. Zhu, Q.F., Kerofsky, L.: Joint source coding, transport processing, and errorconcealment for H.323-based packet video. In Aizawa, K., Stevenson, R.L., Zhang,Y.Q., eds.: Visual Communications and Image Processing ’99. Volume 3653 ofProceedings of SPIE., SPIE (1998) 52–62

32. Kolletzki, S.: Secure internet banking with privacy enhanced mail - a protocol forreliable exchange of secured order forms. Computer Networks and ISDN Systems28 (1996) 1891–1899

33. Grimm, R., Ochsenschläger, P.: Binding Cooperation. A Formal Model for Elec-tronic Commerce. Computer Networks 37 (2001) 171–193


non-repudiation in internet telephony.opendl.ifip-tc6.org/db/conf/sec/sec2007/kuntzesh07.pdf ·...

Documents