JungleDragon image upload dramatically improved »
FERDY CHRISTANT - JAN 2, 2010 (03:04:59 PM)
My project JungleDragon is about user-generated content, images of nature to be specific. Therefore, users need to be able to upload these images. This functionality I have completed about a year ago already, since without it, it is hard to test anything at all. The upload process is quite heavy and complex, the following sequence of events are occuring in real time:
- User selects file to upload
- File is transfered to the JungleDragon web server
- A process cuts out five additional formats (thumbs and such)
- The resulting 6 files are transferred to Amazon S3, their permanent storage location
- Database records are updated (image table, imagefile table, tags table and user karma table)
- The user is redirected to the newly uploaded image
Last week I was using the upload form to upload the following test image (click to enlarge):
This concerns a 1.6MB jpeg with a very high resolution of 3872 x 2592 pixels. I measured the total upload process at 24 seconds, which is awfully slow given that my client machine is on the same home network as my development server. Painful as it was, I had to do some major heart surgery to improve this.
Scheduled Amazon S3 transfer
My first assumption about the upload time was that the realtime transfer of the six files to Amazon S3 would take a lot of time. In essence we're doing a double upload, one to the web server, and then one to the Amazon S3 server.
I decided to move the S3 transfer logic to a scheduled process, a cron job in Linux terms. The cron job looks in the database for image files that are not yet transferred (transferred = 0), makes the transfer, and then updates the image location in the database.
This brings good news: the total upload time decreased by 11 seconds, making it almost twice as fast. What's also nice is that there is no user impact. As long as the images are not yet transferred to S3, the local images are served. Users can directly work with the uploaded image in all sizes. Once the S3 transfer is made, the image will be served from Amazon S3 but the user will not be able to tell.
Smarter resizing
It was nice to see the upload time of the test image go down from 24 seconds to 13 seconds, but it is still not good enough. I had the false assumption that the remaining 11 seconds would primarily be spent on the upload transfer. In reality, something else was wrong. My resizing logic was not efficient. This is what I did:
- Resize from original size to large size
- Resize from original size to medium size
- Resize from original size to thumbnail size
- Etc...
This way, each resize action took about 2 seconds. I changed this into a smarter way of resizing:
- Resize from original size to large size
- Resize from large size to medium size
- Resize from medium size to thumbnail size
- Etc...
This way, the total resize duration went down to about 2 seconds, and the total upload duration to about 3 seconds!
Image compression
Another small but important improvement that I made is to compress images while resizing. By compressing to 90%, the image file size goes down by a factor between 2 and 3. This is an important saving since my Amazon bill is based upon disk space and transfer bandwidth. In other words, this saves me money.
User experience
I also made two important changes to the user experience of the upload form:
- The file input control is styled. It looks consistent with the rest of the design. This is something that cannot be done using CSS only.
- There is now a fake upload progress bar (animated gif) whilst the user waits for the upload. It may feel like cheating, but studies suggest that users experience wait times as shorter when they see something actually happening
Image engine control
Several settings have been added to my back-end code to configure the "image engine". With a few constants I can control:
- Where images are stored (local only, s3 only or local_s3, which is a failover mode)
- Whether images are transfered to s3 directly or scheduled
- In what quality rate to store the images
The result
The result of these improvements can be seen in this video. In the video I first upload the high resolution image, and then an image of which the resolution is optimized for the web.
You may want to view this enlarged in HD in order to follow what it happening. You should see that the upload experience is quite snappy now.
Conclusion
Making these improvements was painful and took quite some long nights but the results are there:
- The total upload duration is speed up by a factor 8
- The resulting file size is between 30% and 50% lower than before (depending on the quality rate)
- The form looks better and gives better feedback
- I can easily control key aspects of the image engine during runtime
For me, I learned two lessons as a result of this improvement battle:
- Don't assume. Know. Measure and know how things work at the lowest level.
- Persist. Do not give up until you have the result you want.



Comments: 3
Reviews: 1
Average rating:
Highest rating: 5
Lowest rating: 5
COMMENT: HENNING HEINZ

JAN 2, 2010 - 16:22:56
You should be prepared that you will get a few job opportunities when your site takes off.
I am also curious how this will work with S3. Amazon S3 scales very well (for Amazon) especially if you generate a lot of traffic.
Thank you for keeping me / us updated. «
COMMENT: MARK BARTON

JAN 2, 2010 - 05:15:12 PM
How do you do your cross browser testing? Some form of automated testing? «
COMMENT: FERDY
JAN 2, 2010 - 17:55:53
The more I work on JungleDragon, the more I realize that it has the potential to be more than a platform for sharing images about nature, most of it can be reused. I hope it will be successful and am open to opportunities, although the opportunities have to be large for me to give up my current day job. We'll see :)
I'm very happy with the way Amazon S3 works. This way I have no worries about backups or storage scalability and it also offloads a lot of web traffic, making Jungledragon much easier to scale.
@Mark,
I do cross browser testing manually. I primarily focus on IE 7 and above, Firefox, Chrome and Safari. When I come closer to completion, I will do more strict testing. «