Tuesday, November 17, 2009

JCOP Smartcard Performance

Abstract
I use NXP JCOP smart-cards for prototyping in my research. I have recently benchmarked the cards using my research code, which contains computational and cryptographic workloads.

I found a couple of surprising results that I want to share, so fellow developers can make informed decisions when choosing their prototyping platforms.

Findings
There is a 1.5-2x speed difference between the different revisions of the same high-end chip, the NXP JCOP41 with 72kb of EEPROM. The V2.2 revision for smart-cards (no longer available, replaced by 2.2.1) has the best performance, and the V2.2.1 revision for the SIM (ID 000) form factor has the worst performance.

There is a significant speed difference between the same revision (V2.2.1) of the same chip (NXP JCOP41, 72kb of EEPROM), in different form factors. The chip for the smart-card form factor is almost as fast as the older V2.2 revision, while the chip for the SIM (ID 000) form factor is significantly slower.

There is a 2-4x speed difference between the same revisions (V2.2, smart-card form factor) of the NXP JCOP31 and the NXP JCOP41 chips.

The 3DES encryption/decryption engine has non-linear performance. The time it takes to decrypt 128 bytes is not very different from the time it takes to decrypt 24 bytes, on the JCOP41 chips. It seems that there is a huge setup cost for the DES engine, which outweighs the actual encryption cost.

There is a 4-8x speed difference between RSA and 3DES encryption on the JCOP41 chips, and a 3x speed difference on the JCOP31 chip. This goes against the conventional wisdom that symmetric encryption is 2 orders of magnitude faster then asymmetric encryption. This is probably due to the time it takes to setup the 3DES engine.

Conclusion
Secure processors in smart-cards have non-obvious performance characteristics. I hope my work saves you from the unpleasant surprises that I had.

Motivation
To the best of my knowledge, there are no easy to get benchmarks on smart-card processors. At least, I couldn't find anything when I searched.

Smart-card retailers disclose vital specifications, like EEPROM size and the cryptographic primitives that are implemented on the chip, but tend to be quiet about speed. The sites I used don't mention anything about the type of processor used, or the frequency of the processor.

For some applications (e.g. prototyping, where I want my unit tests to run quickly), speed is just as critical as the other specifications, and its more important than cost.

Data
The data that I used to reach my conclusions is available below. The benchmarks are described in section 5.1 (page 13) in my paper on a successor to the TPM.

decrypt_3des decrypts 24 bytes of data, while decrypt_3des_long and decrypt_rsa work on 128 bytes of data. 3DES is configured in EDE-CBC mode (112 bits of key material) and uses the ISO-9797 method 2 for padding. RSA decryption uses PCKS#1 padding.

The benchmarks can be reproduced by installing Rubygems, then installing the tem_ruby gem, and issuing the following commands
tem_upload_fw  # Uploads my JavaCard applet to the active smart-card.
tem_bench  # Runs the benchmarks.


NXP JCOP41 v2.2/72k (no longer available)
time_blank_bound_secpack_3des: 0.20757s
time_blank_bound_secpack_rsa: 0.86173s
time_blank_sec: 0.18017s
time_devchip_decrypt_3des: 0.05803s
time_devchip_decrypt_3des_long: 0.08042s
time_devchip_decrypt_rsa_long: 0.74047s
time_post_buffer: 0.08280s
time_simple_apdu: 0.00515s
time_vm_perf: 0.73887s
time_vm_perf_bound_3des: 0.78137s
time_vm_perf_bound_rsa: 1.43647s


NXP JCOP41 v2.2.1/72k (usasmartcard.com product link)
time_blank_bound_secpack_3des: 0.24155s
time_blank_bound_secpack_rsa: 0.89740s
time_blank_sec: 0.18937s
time_devchip_decrypt_3des: 0.08420s
time_devchip_decrypt_3des_long: 0.10800s
time_devchip_decrypt_rsa_long: 0.76480s
time_post_buffer: 0.08577s
time_simple_apdu: 0.00610s
time_vm_perf: 0.83257s
time_vm_perf_bound_3des: 0.90033s
time_vm_perf_bound_rsa: 1.55637s


NXP JCOP41 v2.2.1/72k USB token (usasmartcard.com product link, probably using this card)
time_blank_bound_secpack_3des: 0.41070s
time_blank_bound_secpack_rsa: 1.23089s
time_blank_sec: 0.34530s
time_devchip_decrypt_3des: 0.19010s
time_devchip_decrypt_3des_long: 0.21410s
time_devchip_decrypt_rsa_long: 1.05600s
time_post_buffer: 0.17213s
time_simple_apdu: 0.01000s
time_vm_perf: 1.11310s
time_vm_perf_bound_3des: 1.19703s
time_vm_perf_bound_rsa: 2.01420s


NXP JCOP31 v2.2 (usasmartcard.com product link)
time_blank_bound_secpack_3des: 0.84673s
time_blank_bound_secpack_rsa: 1.78957s
time_blank_sec: 0.78120s
time_devchip_decrypt_3des: 0.23553s
time_devchip_decrypt_3des_long: 0.50060s
time_devchip_decrypt_rsa_long: 1.54990s
time_post_buffer: 0.88864s
time_simple_apdu: 0.02813s
time_vm_perf: 1.84374s
time_vm_perf_bound_3des: 1.92594s
time_vm_perf_bound_rsa: 2.87900s

7 comments:

  1. Hi Victor

    I'm also playing with JCOP cards, and already have JCOP 41 2.2.1 cards. Do you know where I could purchase the new JCOP 41/31 with the OS 2.3.1 or 2.4.1 from NXP? It seems impossible to locate where to buy those cards.

    Thanks in advance.

    ReplyDelete
  2. @jax: I read about the new cards, but I don't know where to find them, either. I'd like to get my hands on them too, to see if they're faster, and if they have AES.

    Sorry I can't be of any help!

    ReplyDelete
  3. Your measurements can be hardly taken to proof that JCOP performance depends on the interface. You don't have the knowledge about the configuration of JCOP for the products you measured and not even the underlying chip technology. Here are some comments to think:
    - JCOP v2.2.1 includes more security countermeasures
    - Underlying HW platform, the SmartMX, has asynchronous design. Specifically it affects performance in the 'free running mode', where the chip runs as fast as possible depending on the energy it has.
    - Most performance issues are based on the applet coding style (key and cipher initialization, memory usage, ..,). I cannot find the source of your performance applet (tc.cap).
    - You're not only measuring the smart card OS, but also your off-card program, computer and reader.
    - There is not much information on your test strategy and equipment used.
    - The statement that native it would be "likely on the order of 20X" is highly speculative without any proof.

    ReplyDelete
  4. @lexdabear: Thank you for your feedback! I have seen your posts in various smart-card forums, and I know you are very knowledgeable on smart-card issues. I am honored that you read my blog post, and I'm sorry if the contents is confusing.

    My main point was that if you're a smart-card application developer, it's very hard to know the performance of the chips that you're buying.

    I supported this main point with examples where different revisions / configurations of the same chip gave different performance results.

    I did not say that the software must be slower. There's certainly a possibility that very different chips are marketed under what appears to be the same name. My point was that developers need to know this, and need to measure the performance on the exact chip they're getting.

    The source code for the JavaCard applet that I measured is at http://github.com/costan/tem_fw and the driver is at http://github.com/costan/tem_ruby/ (the benchmarks are in lib/tem/benchmarks).

    In my TEM paper, I say that the bytecode interpreter that I wrote in JavaCard would likely run 20x faster if it ran directly on native hardware. I did also say that crypto is probably implemented natively, and would not get the same speed-up.

    I came up with the 20X number based on what I figured the JavaCard VM has to do to run my interpreter, versus what my interpreter would look like if I could code it in assembly or do native code generation.

    I'm wondering if you could help me get my hands on the proper SDKs to write native code for these chips. Then I could port my application, measure, and remove the speculation.

    I tried approaching Atmel and Infineon, but I'm having trouble getting them to pay attention to my request. That's probably because I'm a PhD student, and I don't have an order of 1 million units.

    It's disappointing and a bit infuriating that cable pirates have SDKs for these smart-card chips, but a security researcher like me can't get them. Do you think you could help me get my hands on an SDK?

    Thanks so much!

    ReplyDelete
  5. Can you put somewhere applets you used in speed testing? I have some Gemalto TOP cards and would like to compare performance with JCOP.

    ReplyDelete
  6. your blog is very nice Well, it’s amazing. The miracle has been done. Well done.
    --------------------------------
    smart card
    Does anyone know where I can find deep technical information about smart cards?. I'm doing a report for the company I'm working for.

    ReplyDelete
  7. Hi Victor
    I am also new to java card development.i am playing with jcop 31/36 contact less smart card.i encrypted some data from the card using RSA.when i used RSA-2048 it will always give some exception it seems to be it needs some high power consumption.then i used RSA-512 algorithem.then It works but most of the times i can't receive the data.some times it returns 64 bytes.i am used nexus s phone to send the data.i want to know what is the low power consumption java card in jcop cards.(contactless)

    ReplyDelete