Oldskooler Ramblings

the unlikely child born of the home computer wars

The truth about netopsystems

Posted by Trixter on August 19, 2007

I was first made aware of Net Op Systems (currently going by the name of NOS Microsystems Ltd.) when downloading Adobe Acrobat Reader about 3 years ago. I was struck by how small the compressed deliverable was, so, being a compression hobbyist, I did some preliminary analysis and found that they used a considerable about of context arrangement and prediction (ie. “solid” mode in rar/7-zip, or the content-specific predictors in PAQ) to get the size down. I recently ran across their product again when downloading the most recent version of Solaris x86; it comes in a 1.1G NOSSO executable package. The sole payload was a 3.1G .ISO image, which meant the compressed deliverable was 37% of the uncompressed size.  This is very impressive, given that the .ISO image is filled with a lot of .JAR and .BZ2 compressed images themselves. The successful extraction of a workable .ISO file from this compressed deliverable means that NOSSO has to perform the following to work its magic:

  1. Identify the various compressed files in the .ISO wrapper
  2. Extract those compressed files
  3. Decompress the content inside those compressed files
  4. Arrange everything by context (ie. all ASCII text in one group, all binary executables in another group, etc)
  5. Compress the entire thing to a proprietary stream, using content-specific prediction for various content groups
  6. Store the original arrangement of the compressed and uncompressed content

Upon extraction, the NOSSO distributable has to perform the following:

  1. Decompress everything and keep track of it
  2. RECOMPRESS the data originally found in compressed files, so that their effective format is kept the same. There may be small differences due to compression options and implementations, but as long as the end result is usable by the end program (ie. a reassembled .ZIP file is still able to be decompressed into the same contents) then there’s no harm done.
  3. Rearrange the end result back into the original container (in my case an .ISO file)

This is why they call their process “reconstitution” instead of “decompression”, because the end result, while functionally identical, is usually not bit-for-bit identical. By taking advantage of context and recompressing files from less-efficient formats into the more efficient format NOSSO uses internally, we can get these excellent compression ratios. (In fact, I’ll wager that, to speed up reconstitution times, they use a very fast and less efficient version of recompression of the files inside the target wrapper, which would inflate them slightly and result in even more “impressive” compression ratios :-)

What’s the downside? The downside is that this entire process defeats its own purpose. I’ll explain:

NOSSO is marketed as a delivery format that saves everybody bandwidth and, presumably, time. It’s that presumption that allows them to shoot themselves in the foot. While the compressed distributable only took 39 minutes to download on my 6mbit/s cable modem connection, it took a whopping 124 minutes to “reconstitute” on a 2.6GHz P4 with 700MB RAM free (out of 1G RAM total). My total time to get the end result was 163 minutes. (A 2.6GHz machine is not the bleeding edge in 2007, but it’s no slouch either, and is representative of the average system most people will use for everyday use.) At its original size, 3.1G, it would have taken me only 104 minutes to download it.

It would have been faster to get the end result had it not been compressed at all.

Now, 6mbit/s is a pretty fast broadband connection, so I understand that skews the results a bit. With a more common broadband connection speed of 3mbit/s, let’s check the numbers again: Compressed download + extraction: 202 minutes. Uncompressed download: 208 minutes. Okay, so it’s break-even at a 3mbit/s connection. But break-even still involves 100% CPU utilization as the thing is decompressed, resulting in an unusable system for two hours, so it’s still not “free”.

Is there strength in using any compression at all? Let’s check both WinRAR and 7-Zip on the original 3.1G unmodified .ISO file:

  • 7-Zip compressed size: 2.68G. Time to download at 3mbit/s: 187 minutes. Decompression time: 14 minutes. Total time to get the end result: 201 minutes.
  • WinRAR compressed size: 2.69G. Time to download at 3mbit/s: 189 minutes. Decompression time: 3 minutes. Total time to get the end result: 192 minutes.

So, at 3mb/s, the end result was just about the same, except our system was only tied up for 3 or 15 minutes instead of two hours. We’d get even more compression at the same decompression speed if we burst the .ISO like NOSSO does, compressed using WinRAR’s or 7-Zip’s “solid” mode, and then reconstitute it back into an .ISO when done with a small utility program.

My conclusion from all this is that there’s really no point in using NetOpSystem’s product, unless the end-user’s broadband speed is 1mbit/s or slower. But if it’s that slow, the user is already used to ordering DVD-ROMs for delivery instead of trying to download them, right? Or, if the user downloads them anyway, they’re used to firing them off before they go to bed, to download overnight. So, again, no need for the product…

…unless you’re the content producer and want to transfer cost (bandwidth) to the end user (time). Which is probably why NetOpSystems is still in business.


15 Responses to “The truth about netopsystems”

  1. phoenix said

    Just looking at that first webpage, it’s clear that this product is about marketing and business, and not the end-user experience. So your reaction is probably typical but of little concern to the content producers. Hopefully we’ll continue to enjoy a variety of options.

  2. James said

    A wonderful review of something which has always annoyed me!

    In the UK, Virgin Media (recently took over NTL) have announced upgrading all 10Mbit customers to 20Mbit. Everytime I go to download something from adobe I come to the same conclusion as you; it would be faster for me to just download the data since the ‘reconstruction’ takes so much longer, even on a quad core 4GB ram!

  3. mpz said

    Oh, they don’t care about *your* bandwidth or *your* CPU time that goes to waste. All they care about is *their* bandwidth that they have to pay per gigabyte for. NOSSO helps quite handily there..

    (Insert obligatory rant about proper content delivery systems like Akamai or even BitTorrent..)

    BTW, I’m pretty sure the end result has to be very nearly if not exactly bit identical to the original. The ISO file has a self-contained filesystem inside it; if the “re-constituted” zip/jar and bz2 archives were sized any differently from the originals, the filesystem references to file sizes and starting sectors would have to be fixed too. This is not an impossible feat with ISO files, but it would certainly be impossible with proprietary file formats that just embed ZIPs etc..

    Therefore I would guess that they simply figure out the exact parameters and if those are not found, store the files as they are. This is probably the easier way because most ZIP files on the internet and on *nix distribution CDs/DVDs are compressed with zlib (infozip or gzip) – searching through the few options shouldn’t doesn’t take a prohibitively long amount of time during the compression phase. Chances are most ZIPs are compressed either at the default level or -9.

  4. Trixter said

    James: Amazing that you have 20mbit to your house!! The best we can get in the USA is 6mbit (you can get more but you have to pay business costs, ie. $1200 a month or more).

    mpz: They’re definitely not leaving compressed things alone because the .ISO in question is 80% compressed files (*.JAR and *.BZ2) so if they just left it alone, it would be 2.8G instead of 1.1G. Which is why the RAR/7-Zip tests showed 2.8G. So they most definitely recompress the content, and re-recompress it into the original file format upon “reconstitution”.

  5. mpz said

    Oh, of course they do, I was just saying that they pretty much *must* reproduce the original data exactly (at the “reconstitution” phase, in other words re-recompressing into the “original” zips and bz2s) – which isn’t that big of a feat (there are also others that do it like Precomp and so on).

  6. Brolin Empey said

    Trixter: I am curious how your cable modem can transfer less than 1 bit per second, considering that a bit is a fundamental unit of information. ;) ‘m’ = milli, ‘M’ = mega. Thus, mbit = millibit.

    At least “mbit” is clearly a mistake — “Mbit” was intended — because of the bit’s status as a fundamental unit of information.

    However, using SI multiplicative prefixes without a unit is poor form. Granted, it is usually assumed that such quantities are using units of bytes in the context of data storage, and bits in the context of data transmission. Regardless, it is better to be explicit and unambiguous than implicit and potentially ambiguous — especially when being explicit requires the use of only a /single character/ more. :P

    You are lucky that you did not mention HDD capacities. If you did, you would have been mixing decimal usage and binary misuse of SI multiplicative prefixes. :)

    The CPU utilisation might become less of an issue as multi-core PCs become more common. All Intel Macs, for example, have /at least/ 2 logical processors.

    mpz: .bz2 files are not archives! :) bzip2 is used to compress uncompressed archive files, such as .tar files. This is because both (GNU) tar and bzip2, unlike zip archivers such as PKZIP and InfoZIP, follow the Unix philosophy: each program should do only one task (archiving xor compression), and do it well.

    Trixter: Your wildcard globbing patterns will not work on a standard, case-sensitive *nix file system, since *nix archive files (well, *nix files in general, unless they were e.g. created on DOS, which uses SHOUTING names :)) use lowercase file extensions. ;)

    JAR archives, like ZIP, can be created without compression (store only), so it is possible that some of the .jar files in your disc image are not compressed. Granted, the default seems to be to use compression with at least the jar program from sun-jdk-, which has a -0 option to store only.

  7. Brolin: Good lord, your attention to detail is wasted on picking at Jim’s blog posts!

    Jim: Fascinating post! I wondered how they did it also. It did make me think about how much you could optimise things further – I always wondered why no one worked on a flexible compression scheme (a lossy system perhaps), and further methods to repack compression schemes, being more aggressive with huffman tables in mp3 (I think there was a tool called Rehuff that did that and shaved off 1-3% iirc).

    The only context I’ve seen this done is manual ripped warez releases where audio is recompressed as well as having video stripped out, although I think it stopped with Dreamcast releases (as you well know the GD-ROM format is more spacious than that of CD-R).

  8. Alex said

    I’ve just looked for myself and come to a different conclusion to you lot – Nosso talk about GetPlus which as far as I know hasn’t been used for Adobe or the Sun Solaris ISO (where I got my interest in digging around a bit more) – so yes, in those particular cases the companies were just interested in reducing their bandwidth bills rather than improving the result for end users. However, both those cases are ‘free’ tools – one is large distribution and one is simply a large download.

    I say if there is a business to be made from saving companies bandwidth bills then so be it, and if it means they don’t have to change their current distribution models or dabble in bittorrent then I can understand why they’d go for it.

  9. Jorge said

    GetPlus _is_ used with Acrobat 9.

    I came across this page while searching for information on NOSSO. Man, I have hated that stuff for ages. It’s _so_ slow and annoying. I would think they should worry more about customer perception than saving a tiny bit of bandwidth. Currently it looks like their software isn’t very well designed because it’s so slow to install.

  10. H said

    I worked for NOS in 2004 and was one of those who were responsible for the Mac OS X release and worked also on the MS Windows version of NOSSO (whose name was FEAD back then).

    I’m still bound by NDAs so I’m not allowed to say much, however your observations are mostly correct. As it were the worst months in my carreer as a software engineer (it’s one of those classical graduate-grinders), I wouldn’t be able to say many nice things anyway. ;)

    Their products become obsolete as bandwidths grow so I’m leaning back and enjoying their fall. ;)

  11. Trixter said

    Thanks for the confirmation that my observations were correct :-)

  12. Max said

    The latest versions of Adobe Reader use a new version of Nosso – they seem to start the reconstitution while still downloading, which substantially reduces the processing after the end of the download.
    As you say, still mainly of benefit to the publisher rather than the user.
    I think they must have to reconstruct a bit-for bit copy, as usually all the files in an installation package have CRCs.

  13. Yuhong Bao said

    “The latest versions of Adobe Reader use a new version of Nosso – they seem to start the reconstitution while still downloading, which substantially reduces the processing after the end of the download.
    As you say, still mainly of benefit to the publisher rather than the user.”
    But a good step forward, but I have a better idea. How about benchmarking both the system and the internet connection and use that to determine how much compression to use? If for example the system is slow but the internet connection is fast, less compression can be used, but if it is the opposite, more compression can be used.

  14. Trixter said

    A good idea, but as I suspected, this would only benefit the end-user and not the publisher. The publisher will always want to save the most bandwidth.

  15. yuhong said

    To be honest, I think the first version of Acrobat Reader to do this was 6.0, which dates back to 2003. I think dial-up modems was still common back then.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: