Friday, 09 July 2004
The CD/DVD creation process in debian-cd is very very slow for two reasons:
The first part is kind of unavoidable - to be able to make an ISO image, you have to actually read in all the data that will go into that image, and then write it out. To make this go faster, you simply need to supply good disk hardware - there's not really much that can be done algorithmically.
The second part is the bit we can do something about. At the moment, the CD creation process includes:
In reverse order:
Steps 6 & 7: I've already written JTE to make step 7 much faster: generate the jigdo files directly from mkisofs while we still have all the data we need (paths to each file), instead of having to work back from the image by brute force. This makes step 6 slightly slower, but the cost of md5summing data we're already reading and writing is not too bad.
Step 5: Phil Hands has modified debian-cd to use
Step 4: Making disks bootable is normally trivial and take almost no time, so it can be ignored
Step 3: apt-ftparchive currently generates all the md5sums from all the files it will place into a Packages file.
Step 2: working out what files will fit where and creating the CD trees is also reasonably quick these days. Even "copying" the data into place is fast, as we can simply create trees of hard links rather than actually copy the data.
Step 1: the mirror check is the next thing I'm looking at for a performance gain. It's necessary for release builds, to make sure that the packages and sources that go on the CDs and DVDs exactly match what's on ftp-master.debian.org and haven't been corrupted in transit. However, this step takes a long time, so long that many people disable it when running debian-cd.
What I've done is to move the md5 check to later in the process. My JTE patch already pushes steps 6 and 7 together into one stage and also calculates md5 sums as it goes. The obvious change to make is to check the files at that point. Instead of checking the mirror up-front, simply build a list of files and md5sums and feed that to mkisofs so it can do the work, almost for free. If any files fail to match when we're building the image, fail at that point. I've written support for this, and it will be in JTE 1.6, coming Real Soon Now (TM).
I'm not sure of how to progress JTE further - it clearly needs packaging, but that will probably involve forking mkisofs. Joerg is infamously difficult to please in terms of accepting patches for cdrtools, and the current mkisofs maintainers haven't responded to my mails about JTE AT ALL
In other news, I'm about to commit a debian-cd change to fix the problem I've been seeing of HFS hybrid discs (powerpc and m68k) being too big.