{"id":1333,"date":"2018-10-02T15:20:56","date_gmt":"2018-10-02T14:20:56","guid":{"rendered":"https:\/\/rosetta.vn\/short\/?p=1333"},"modified":"2018-10-02T15:20:56","modified_gmt":"2018-10-02T14:20:56","slug":"how-we-rolled-out-one-of-the-largest-python-3-migrations-ever-dropbox-tech-blog","status":"publish","type":"post","link":"https:\/\/rosetta.vn\/short\/2018\/10\/02\/how-we-rolled-out-one-of-the-largest-python-3-migrations-ever-dropbox-tech-blog\/","title":{"rendered":"How we rolled out one of the largest Python 3 migrations ever | Dropbox Tech Blog"},"content":{"rendered":"<blockquote><p>Dropbox is one of the most popular desktop applications in the world: You can install it today on Windows, macOS, and some flavors of Linux. What you may not know is that much of the application is written using Python. In fact, Drew\u2019s very first lines of code for Dropbox were written in Python for Windows using venerable libraries such as\u00a0<code>pywin32<\/code>.<\/p>\n<p>Though we\u2019ve relied on Python 2 for many years (most recently, we used Python 2.7), we began moving to Python 3 back in 2015. This transition is now complete: If you\u2019re using Dropbox today, the application is powered by a Dropbox-customized variant of Python 3.5. This post is the first in a series that explores how we planned, executed, and rolled out one of the largest Python 3 migrations ever.<\/p>\n<h2>Why Python 3?<\/h2>\n<p>Python 3 adoption has long been a subject of debate in the Python community. This is still somewhat true, though it\u2019s now reached\u00a0<a href=\"http:\/\/py3readiness.org\/\" target=\"_blank\" rel=\"noopener\">widespread support<\/a>, with some very popular projects such as Django dropping Python 2 support entirely. As for us, a few key factors influenced our decision to make the jump:<\/p>\n<p><strong>Exciting new features<\/strong><br \/>\nPython 3 has seen rapid innovation. Apart from the (very)\u00a0<a href=\"http:\/\/whypy3.com\/\" target=\"_blank\" rel=\"noopener\">long list<\/a>\u00a0of general improvements (e.g. the\u00a0<code>str<\/code>\u00a0vs\u00a0<code>bytes<\/code>rationalization), a few specific features caught our eye:<\/p>\n<ul>\n<li>Type annotation syntax: Our codebase is quite large, so the ability to use type annotations has been important to developer productivity. We\u2019re big fans of\u00a0<a href=\"http:\/\/mypy-lang.org\/\" target=\"_blank\" rel=\"noopener\">MyPy<\/a>\u00a0here at Dropbox, so the ability to natively support type annotations is naturally appealing to us.<\/li>\n<li>Coroutine function syntax: We rely heavily on threading and message-passing\u2014through variants of the Actor pattern and by using\u00a0<code>Future<\/code>s\u2014to build many of our features. The\u00a0<code>asyncio<\/code>\u00a0project and its\u00a0<code>async<\/code>\/<code>await<\/code>syntax could sometimes remove the need for callbacks, leading to cleaner code.<\/li>\n<\/ul>\n<p><strong>Aging toolchains<\/strong><br \/>\nAs Python 2 has aged, the set of toolchains initially compatible for deploying it has largely become obsolete. Due to these factors, continued use of Python 2 was associated with a growing maintenance burden:<\/p>\n<ul>\n<li>The use of older compilers\/runtimes was limiting our ability to upgrade some important dependencies.\n<ul>\n<li>For example, we use Qt on Windows and Linux: Recent versions of Qt require more modern compilers due to the inclusion of Chromium (via QtWebEngine).<\/li>\n<\/ul>\n<\/li>\n<li>As we continued to integrate deeply with the operating system, our inability to rely on more recent versions of these toolchains increased the cost of adoption for newer APIs.\n<ul>\n<li>For example, Python 2 still\u00a0<a href=\"http:\/\/stevedower.id.au\/blog\/building-for-python-3-5\/\" target=\"_blank\" rel=\"noopener\">technically<\/a>\u00a0requires Visual Studio 2008. This version is no longer supported by Microsoft and is not compatible with the Windows 10 SDK.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>Freezers and scripts<\/h2>\n<p>Initially, we relied on \u201cfreezer\u201d scripts to create the native applications for each of our supported platforms. However, rather than use the native toolchains directly, such as Xcode for macOS, we delegated the creation of platform-compliant binaries to\u00a0<code>py2exe<\/code>\u00a0for Windows,\u00a0<code>py2app<\/code>\u00a0for macOS, and\u00a0<code>bbfreeze<\/code>for Linux. This Python-focused build system was inspired by\u00a0<code>distutils<\/code>: Our application was initially little more than a Python package, so we had a single\u00a0<code>setup.py<\/code>-like script to build it.<\/p>\n<p>Over time, our codebase became more and more heterogenous. Today, Python is no longer the only language used for development. In fact, our code now consists of a mix of TypeScript\/HTML, Rust, and Python, as well as Objective-C and C++ for some specific platform integrations. To support all these components, this\u00a0<code>setup.py<\/code>\u00a0script\u2014internally named\u00a0<code>build-all.py<\/code>\u2014grew to be so large and messy that it became difficult to maintain.<\/p>\n<p>The tipping point came from changes to\u00a0<em>how<\/em>\u00a0we integrate with each operating system: First, we began introducing increasingly advanced OS extensions\u2014like Smart Sync\u2019s kernel components\u2014that can\u2019t and often shouldn\u2019t be written in Python. Second, vendors like Microsoft and Apple began introducing new requirements for deploying applications that imposed the use of new, more sophisticated and often proprietary tools (e.g. code signing).<\/p>\n<p>On macOS, for example, version 10.10 introduced a new app extension for integrating with the Finder:\u00a0<code>[<a href=\"https:\/\/developer.apple.com\/library\/archive\/documentation\/General\/Conceptual\/ExtensibilityPG\/Finder.html\" target=\"_blank\" rel=\"noopener\">FinderSync<\/a>]<\/code>. Not merely an API, a FinderSync extension is a full-blown application package (<code>.appex<\/code>) with custom life cycle rules (i.e. it is launched by the OS) and more stringent requirements for inter-process communication. Put another way: Xcode makes leveraging these extensions easy, while\u00a0<code>py2app<\/code>\u00a0does not support them altogether.<\/p>\n<p>We were therefore faced with two problems:<\/p>\n<ul>\n<li>Our use of Python 2 prevented us from using new toolchains, making using new APIs more costly (e.g. using the Windows Runtime on Windows 10).<\/li>\n<li>Our use of freezer scripts made deploying native code more costly (e.g. building app extensions on macOS).<\/li>\n<\/ul>\n<p>While we knew that we wanted to migrate to Python 3, this left us with a choice: invest in the freezer dependencies to add support for Python 3 (and thus the modern compilers) and platform-specific features (like app extensions), or move away from a Python-centric build system, doing away with \u201cfreezers\u201d altogether. We chose the latter.<\/p>\n<p><em>A note on\u00a0<code>pyinstaller<\/code>:<\/em>\u00a0We seriously considered using it in the early stages of the project, but it did not support Python 3 at the time, and more importantly, it suffers from similar limitations as other freezers. Regardless, it is an impressive project that we simply felt didn\u2019t suit our needs.<\/p>\n<h2>Embedding Python<\/h2>\n<p>To solve this build and deploy problem, we decided on a new architecture to embed the Python runtime in our native application. Rather than delegate this process to the freezers, we would use tooling specific to each platform (e.g. Visual Studio on Windows) to build the various entry points ourselves. Further, we would abstract Python code behind a library, aiming to more directly support the \u201cmixing and matching\u201d of various languages.<\/p>\n<p>This would allow us to make use of each platform\u2019s IDEs\/toolchain directly (e.g. to add native targets like FinderSync on macOS) while retaining the ability to conveniently write much of our application logic in Python.<\/p>\n<p>We landed on the following rough structure:<\/p>\n<ul>\n<li>Native entry points: These are compatible with each platform\u2019s application model.\n<ul>\n<li>This includes application extensions, such as COM components on Windows or app extensions on macOS.<\/li>\n<\/ul>\n<\/li>\n<li>Shared libraries written in multiple languages (including Python).<\/li>\n<\/ul>\n<p>On the surface, the application would more closely resemble what the platform expects, while behind various libraries, teams would have more flexibility to use their choice of programming language or tooling.<\/p>\n<p>This architecture\u2019s increased modularity would also provide a key side effect: It would now be possible to deploy both a Python 2 library and a Python 3 library side by side. Tying this back to the Python 3 migration, the process would thus require two steps: first, to implement the new architecture around Python 2, and second, to use it to \u201cswap out\u201d Python 2 in favor of Python 3.<\/p>\n<h2>Step 1: \u201cAnti-freeze\u201d<\/h2>\n<p>Our first step was to stop using the freezer scripts. Both\u00a0<code>bbfreeze<\/code>\u00a0and\u00a0<code>pywin32<\/code>\u00a0lacked Python 3 support at this stage, leaving us little choice. Starting in 2016, we began to gradually make this change.<\/p>\n<p>First, we abstracted away the work of configuring the Python runtime and starting Python threads to a new library named\u00a0<code>libdropbox_bootstrap<\/code>. This library would replicate some of what the freezer scripts provided. Though we no longer needed to rely on these scripts wholesale, it was still necessary to provide a minimum basis to run Python code:<\/p>\n<p><strong>Packaging our code for on-device execution<\/strong><br \/>\nThis ensures we ship compiled Python \u201cbytecode\u201d rather than raw Python source. Where each freezer script previously had its own on-disk format, we used this opportunity to introduce a single format for bundling our code across all platforms:<\/p>\n<ul>\n<li>For Python bytecode\u00a0<code>.pyc<\/code>, a single ZIP archive (e.g.\u00a0<code>python-packages-35.zip<\/code>) contains all necessary Python modules.<\/li>\n<li>For native extensions\u00a0<code>.pyd<\/code>\/<code>.so<\/code>, as these are platform-native DLLs, they are installed in a location that guarantees the application can load them without interference.\n<ul>\n<li>On Windows, for example, they are alongside the entry points (i.e.\u00a0<code>Dropbox.exe<\/code>).<\/li>\n<\/ul>\n<\/li>\n<li>Packaging is implemented using the excellent\u00a0<code>modulegraph<\/code>\u00a0(by Ronald Oussoren of\u00a0<code>py2app<\/code>\u00a0and\u00a0<code>PyObjC<\/code>\u00a0fame).<\/li>\n<\/ul>\n<p><strong>Isolating our Python interpreter<\/strong><br \/>\nThis prevents our application from running other on-device Python source. Interestingly, Python 3 makes this type of embedding much simpler. The new\u00a0<code>[<a href=\"https:\/\/docs.python.org\/3\/c-api\/init.html#c.Py_SetPath\" target=\"_blank\" rel=\"noopener\">Py_SetPath<\/a>]<\/code>\u00a0function, for example, allowed us to isolate our code without having to do some of the more complicated work of isolation the freezer scripts had to do on Python 2. To support this in Python 2, we back-ported this function to our custom fork.<\/p>\n<p>Second, we introduced platform-specific entry points\u00a0<code>Dropbox.exe<\/code>,\u00a0<code>Dropbox.app<\/code>, and\u00a0<code>dropboxd<\/code>\u00a0to make use of this library. These entry points were built using each platform\u2019s \u201cstandard\u201d tooling: Visual Studio, Xcode, and\u00a0<code>make<\/code>\u00a0were used rather than\u00a0<code>distutils<\/code>, allowing us to remove much of the custom patchwork imposed on the freezer scripts. For example, on Windows, this greatly simplified configuring DEP\/NX for\u00a0<code>Dropbox.exe<\/code>, embedding an application manifest as well as including resources.<\/p>\n<p><em>A note on Windows<\/em>: At this point, continued use of Visual Studio 2008 was becoming highly costly. To transition properly, we needed a version capable of supporting both Python 2 and 3 simultaneously, so we settled on Visual Studio 2013. To support it, we extensively altered our custom fork of Python 2 to make it properly compile using that version. The cost of these changes further reinforced our belief that moving to Python 3 was the right decision.<\/p>\n<h2>Step 2: Hydra<\/h2>\n<p>Successfully making a transition of this size (our application contains over 1 million Python LOCs) and at our scale (hundreds of millions of installs) would require a gradual process: We couldn\u2019t simply \u201cflip a switch\u201d in a single release\u2014this was especially true due to our release process, which deploys new versions to all our users every two weeks. There would have to be a way to expose a small\/growing number of users to Python 3 in order to detect and fix bugs early.<\/p>\n<p>To achieve this, we decided to make it possible to build Dropbox using\u00a0<em>both<\/em>\u00a0Python 2 and 3. This entailed:<\/p>\n<ul>\n<li>The ability to ship both Python 2 and Python 3 \u201cpackages,\u201d complete with bytecode and extensions, side by side.<\/li>\n<li>The enforcing of a hybrid Python 2\/3 syntax during the transition.<\/li>\n<\/ul>\n<p>We used the embedded design introduced through the previous step to our advantage: By abstracting away Python into a library and package, we could easily introduce\u00a0<em>another<\/em>variant for another version. Choosing what Python version to use could then be controlled in the entry point itself (e.g.\u00a0<code>Dropbox.app<\/code>) during early initialization.<\/p>\n<p>This was achieved by making the entry point manually link against\u00a0<code>libdropbox_bootstrap<\/code>. On macOS and Linux, for example, we used\u00a0<code>dlopen<\/code>\/<code>dlsym<\/code>\u00a0once a version of Python was chosen. On Windows, we used\u00a0<code>LoadLibrary<\/code>\u00a0and\u00a0<code>GetProcAddress<\/code>.<\/p>\n<p>The choice of a runtime Python interpreter needed to be made before Python was loaded, so we made it possible for it to be influenced using both a command-line argument\u00a0<code>\/py3<\/code>for development purposes and a persistent on-disk setting so it could be controlled by\u00a0<a href=\"https:\/\/blogs.dropbox.com\/tech\/2017\/03\/introducing-stormcrow\/\">Stormcrow<\/a>, our feature-gating system.<\/p>\n<p>With this in place, we were able to dynamically choose the Python version when launching the Dropbox client. This allowed us to set up additional jobs in our CI infrastructure to run unit and integration tests targeting Python 3. We also integrated automated checks to our commit queue to prevent changes from being pushed that would regress Python 3 support.<\/p>\n<p>Once we had gained enough confidence through automated testing, we began rolling out Python 3 to real users. This was achieved by incrementally opting in clients through a remote feature gate. We first rolled out the change to Dropboxers, which allowed us to identify and correct a majority of the underlying issues. We later expanded this to a fraction of our Beta population\u2014which is a lot more heterogeneous when it comes to OS versions\u2014eventually expanding to our Stable channel: Within 7 months, all Dropbox installs were running Python 3. In order to maximize quality, we adopted a policy requiring that all bugs identified as migration-related be fully investigated and corrected before expanding the number of exposed users.<\/p>\n<figure id=\"attachment_5035\" class=\"wp-caption alignnone\"><a href=\"https:\/\/dropboxtechblog.files.wordpress.com\/2018\/09\/01-python-rollout-beta.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-5035 size-medium\" src=\"https:\/\/i0.wp.com\/rosetta.vn\/short\/wp-content\/uploads\/sites\/3\/2018\/10\/01-python-rollout-beta.png?resize=650%2C337&#038;ssl=1\" alt=\"\" width=\"650\" height=\"337\" data-attachment-id=\"5035\" data-permalink=\"https:\/\/blogs.dropbox.com\/tech\/2018\/09\/how-we-rolled-out-one-of-the-largest-python-3-migrations-ever\/01-python-rollout-beta\/\" data-orig-file=\"https:\/\/dropboxtechblog.files.wordpress.com\/2018\/09\/01-python-rollout-beta.png\" data-orig-size=\"1688,874\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"01-python-rollout-beta\" data-image-description=\"\" data-medium-file=\"https:\/\/i0.wp.com\/rosetta.vn\/short\/wp-content\/uploads\/sites\/3\/2018\/10\/01-python-rollout-beta.png?resize=650%2C337&#038;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/rosetta.vn\/short\/wp-content\/uploads\/sites\/3\/2018\/10\/01-python-rollout-beta.png?resize=650%2C337&#038;ssl=1\" data-lazy-loaded=\"true\" data-recalc-dims=\"1\" \/><\/a><figcaption class=\"wp-caption-text\">Gradual rollout on the Beta channel<\/figcaption><\/figure>\n<figure id=\"attachment_5034\" class=\"wp-caption alignnone\"><a href=\"https:\/\/dropboxtechblog.files.wordpress.com\/2018\/09\/02-python-rollout-stable.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-5034 size-medium\" src=\"https:\/\/i0.wp.com\/rosetta.vn\/short\/wp-content\/uploads\/sites\/3\/2018\/10\/02-python-rollout-stable.png?resize=650%2C337&#038;ssl=1\" alt=\"\" width=\"650\" height=\"337\" data-attachment-id=\"5034\" data-permalink=\"https:\/\/blogs.dropbox.com\/tech\/2018\/09\/how-we-rolled-out-one-of-the-largest-python-3-migrations-ever\/02-python-rollout-stable\/\" data-orig-file=\"https:\/\/dropboxtechblog.files.wordpress.com\/2018\/09\/02-python-rollout-stable.png\" data-orig-size=\"1688,874\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"02-python-rollout-stable\" data-image-description=\"\" data-medium-file=\"https:\/\/i0.wp.com\/rosetta.vn\/short\/wp-content\/uploads\/sites\/3\/2018\/10\/02-python-rollout-stable.png?resize=650%2C337&#038;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/rosetta.vn\/short\/wp-content\/uploads\/sites\/3\/2018\/10\/02-python-rollout-stable.png?resize=650%2C337&#038;ssl=1\" data-lazy-loaded=\"true\" data-recalc-dims=\"1\" \/><\/a><figcaption class=\"wp-caption-text\">Gradual rollout on the Stable channel<\/figcaption><\/figure>\n<p>As of version 52, this process is complete: Python 2 has been removed altogether from Dropbox\u2019s desktop client.<\/p>\n<h2>But wait, there\u2019s more<\/h2>\n<p>There\u2019s much more to tell about this process. In future posts, we\u2019ll look at:<\/p>\n<ul>\n<li>How we report crashes on Windows and macOS and use them to debug both native and Python code.<\/li>\n<li>How we maintained a hybrid Python 2 and 3 syntax, and what tools helped.<\/li>\n<li>Our very best bugs and stories from the Python 3 migration.<\/li>\n<\/ul>\n<\/blockquote>\n<p>Source: <em><a href=\"https:\/\/blogs.dropbox.com\/tech\/2018\/09\/how-we-rolled-out-one-of-the-largest-python-3-migrations-ever\/\">How we rolled out one of the largest Python 3 migrations ever | Dropbox Tech Blog<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Dropbox is one of the most popular desktop applications in the world: You can install it today on Windows, macOS, and some flavors of Linux. What you may not know is that much of the application is written using Python.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false},"categories":[30],"tags":[915,511,47,222,914],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p8jhJx-lv","_links":{"self":[{"href":"https:\/\/rosetta.vn\/short\/wp-json\/wp\/v2\/posts\/1333"}],"collection":[{"href":"https:\/\/rosetta.vn\/short\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rosetta.vn\/short\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rosetta.vn\/short\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rosetta.vn\/short\/wp-json\/wp\/v2\/comments?post=1333"}],"version-history":[{"count":1,"href":"https:\/\/rosetta.vn\/short\/wp-json\/wp\/v2\/posts\/1333\/revisions"}],"predecessor-version":[{"id":1336,"href":"https:\/\/rosetta.vn\/short\/wp-json\/wp\/v2\/posts\/1333\/revisions\/1336"}],"wp:attachment":[{"href":"https:\/\/rosetta.vn\/short\/wp-json\/wp\/v2\/media?parent=1333"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rosetta.vn\/short\/wp-json\/wp\/v2\/categories?post=1333"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rosetta.vn\/short\/wp-json\/wp\/v2\/tags?post=1333"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}