434 lines
19 KiB
Plaintext
434 lines
19 KiB
Plaintext
|
page.title=Designing for Performance
|
||
|
@jd:body
|
||
|
|
||
|
<div id="qv-wrapper">
|
||
|
<div id="qv">
|
||
|
|
||
|
<h2>In this document</h2>
|
||
|
<ol>
|
||
|
<li><a href="#intro">Introduction</a></li>
|
||
|
<li><a href="#optimize_judiciously">Optimize Judiciously</a></li>
|
||
|
<li><a href="#object_creation">Avoid Creating Objects</a></li>
|
||
|
<li><a href="#myths">Performance Myths</a></li>
|
||
|
<li><a href="#prefer_static">Prefer Static Over Virtual</a></li>
|
||
|
<li><a href="#internal_get_set">Avoid Internal Getters/Setters</a></li>
|
||
|
<li><a href="#use_final">Use Static Final For Constants</a></li>
|
||
|
<li><a href="#foreach">Use Enhanced For Loop Syntax</a></li>
|
||
|
<li><a href="#avoid_enums">Avoid Enums Where You Only Need Ints</a></li>
|
||
|
<li><a href="#package_inner">Use Package Scope with Inner Classes</a></li>
|
||
|
<li><a href="#avoidfloat">Use Floating-Point Judiciously</a> </li>
|
||
|
<li><a href="#library">Know And Use The Libraries</a></li>
|
||
|
<li><a href="#native_methods">Use Native Methods Judiciously</a></li>
|
||
|
<li><a href="#closing_notes">Closing Notes</a></li>
|
||
|
</ol>
|
||
|
|
||
|
</div>
|
||
|
</div>
|
||
|
|
||
|
<p>An Android application will run on a mobile device with limited computing
|
||
|
power and storage, and constrained battery life. Because of
|
||
|
this, it should be <em>efficient</em>. Battery life is one reason you might
|
||
|
want to optimize your app even if it already seems to run "fast enough".
|
||
|
Battery life is important to users, and Android's battery usage breakdown
|
||
|
means users will know if your app is responsible draining their battery.</p>
|
||
|
|
||
|
<p>Note that although this document primarily covers micro-optimizations,
|
||
|
these will almost never make or break your software. Choosing the right
|
||
|
algorithms and data structures should always be your priority, but is
|
||
|
outside the scope of this document.</p>
|
||
|
|
||
|
<a name="intro" id="intro"></a>
|
||
|
<h2>Introduction</h2>
|
||
|
|
||
|
<p>There are two basic rules for writing efficient code:</p>
|
||
|
<ul>
|
||
|
<li>Don't do work that you don't need to do.</li>
|
||
|
<li>Don't allocate memory if you can avoid it.</li>
|
||
|
</ul>
|
||
|
|
||
|
<h2 id="optimize_judiciously">Optimize Judiciously</h2>
|
||
|
|
||
|
<p>This document is about Android-specific micro-optimization, so it assumes
|
||
|
that you've already used profiling to work out exactly what code needs to be
|
||
|
optimized, and that you already have a way to measure the effect (good or bad)
|
||
|
of any changes you make. You only have so much engineering time to invest, so
|
||
|
it's important to know you're spending it wisely.
|
||
|
|
||
|
<p>(See <a href="#closing_notes">Closing Notes</a> for more on profiling and
|
||
|
writing effective benchmarks.)
|
||
|
|
||
|
<p>This document also assumes that you made the best decisions about data
|
||
|
structures and algorithms, and that you've also considered the future
|
||
|
performance consequences of your API decisions. Using the right data
|
||
|
structures and algorithms will make more difference than any of the advice
|
||
|
here, and considering the performance consequences of your API decisions will
|
||
|
make it easier to switch to better implementations later (this is more
|
||
|
important for library code than for application code).
|
||
|
|
||
|
<p>(If you need that kind of advice, see Josh Bloch's <em>Effective Java</em>,
|
||
|
item 47.)</p>
|
||
|
|
||
|
<p>One of the trickiest problems you'll face when micro-optimizing an Android
|
||
|
app is that your app is pretty much guaranteed to be running on multiple
|
||
|
hardware platforms. Different versions of the VM running on different
|
||
|
processors running at different speeds. It's not even generally the case
|
||
|
that you can simply say "device X is a factor F faster/slower than device Y",
|
||
|
and scale your results from one device to others. In particular, measurement
|
||
|
on the emulator tells you very little about performance on any device. There
|
||
|
are also huge differences between devices with and without a JIT: the "best"
|
||
|
code for a device with a JIT is not always the best code for a device
|
||
|
without.</p>
|
||
|
|
||
|
<p>If you want to know how your app performs on a given device, you need to
|
||
|
test on that device.</p>
|
||
|
|
||
|
<a name="object_creation"></a>
|
||
|
<h2>Avoid Creating Objects</h2>
|
||
|
|
||
|
<p>Object creation is never free. A generational GC with per-thread allocation
|
||
|
pools for temporary objects can make allocation cheaper, but allocating memory
|
||
|
is always more expensive than not allocating memory.</p>
|
||
|
|
||
|
<p>If you allocate objects in a user interface loop, you will force a periodic
|
||
|
garbage collection, creating little "hiccups" in the user experience.</p>
|
||
|
|
||
|
<p>Thus, you should avoid creating object instances you don't need to. Some
|
||
|
examples of things that can help:</p>
|
||
|
|
||
|
<ul>
|
||
|
<li>When extracting strings from a set of input data, try
|
||
|
to return a substring of the original data, instead of creating a copy.
|
||
|
You will create a new String object, but it will share the char[]
|
||
|
with the data.</li>
|
||
|
<li>If you have a method returning a string, and you know that its result
|
||
|
will always be appended to a StringBuffer anyway, change your signature
|
||
|
and implementation so that the function does the append directly,
|
||
|
instead of creating a short-lived temporary object.</li>
|
||
|
</ul>
|
||
|
|
||
|
<p>A somewhat more radical idea is to slice up multidimensional arrays into
|
||
|
parallel single one-dimension arrays:</p>
|
||
|
|
||
|
<ul>
|
||
|
<li>An array of ints is a much better than an array of Integers,
|
||
|
but this also generalizes to the fact that two parallel arrays of ints
|
||
|
are also a <strong>lot</strong> more efficient than an array of (int,int)
|
||
|
objects. The same goes for any combination of primitive types.</li>
|
||
|
<li>If you need to implement a container that stores tuples of (Foo,Bar)
|
||
|
objects, try to remember that two parallel Foo[] and Bar[] arrays are
|
||
|
generally much better than a single array of custom (Foo,Bar) objects.
|
||
|
(The exception to this, of course, is when you're designing an API for
|
||
|
other code to access; in those cases, it's usually better to trade
|
||
|
correct API design for a small hit in speed. But in your own internal
|
||
|
code, you should try and be as efficient as possible.)</li>
|
||
|
</ul>
|
||
|
|
||
|
<p>Generally speaking, avoid creating short-term temporary objects if you
|
||
|
can. Fewer objects created mean less-frequent garbage collection, which has
|
||
|
a direct impact on user experience.</p>
|
||
|
|
||
|
<a name="myths" id="myths"></a>
|
||
|
<h2>Performance Myths</h2>
|
||
|
|
||
|
<p>Previous versions of this document made various misleading claims. We
|
||
|
address some of them here.</p>
|
||
|
|
||
|
<p>On devices without a JIT, it is true that invoking methods via a
|
||
|
variable with an exact type rather than an interface is slightly more
|
||
|
efficient. (So, for example, it was cheaper to invoke methods on a
|
||
|
<code>HashMap map</code> than a <code>Map map</code>, even though in both
|
||
|
cases the map was a <code>HashMap</code>.) It was not the case that this
|
||
|
was 2x slower; the actual difference was more like 6% slower. Furthermore,
|
||
|
the JIT makes the two effectively indistinguishable.</p>
|
||
|
|
||
|
<p>On devices without a JIT, caching field accesses is about 20% faster than
|
||
|
repeatedly accesssing the field. With a JIT, field access costs about the same
|
||
|
as local access, so this isn't a worthwhile optimization unless you feel it
|
||
|
makes your code easier to read. (This is true of final, static, and static
|
||
|
final fields too.)
|
||
|
|
||
|
<a name="prefer_static" id="prefer_static"></a>
|
||
|
<h2>Prefer Static Over Virtual</h2>
|
||
|
|
||
|
<p>If you don't need to access an object's fields, make your method static.
|
||
|
Invocations will be about 15%-20% faster.
|
||
|
It's also good practice, because you can tell from the method
|
||
|
signature that calling the method can't alter the object's state.</p>
|
||
|
|
||
|
<a name="internal_get_set" id="internal_get_set"></a>
|
||
|
<h2>Avoid Internal Getters/Setters</h2>
|
||
|
|
||
|
<p>In native languages like C++ it's common practice to use getters (e.g.
|
||
|
<code>i = getCount()</code>) instead of accessing the field directly (<code>i
|
||
|
= mCount</code>). This is an excellent habit for C++, because the compiler can
|
||
|
usually inline the access, and if you need to restrict or debug field access
|
||
|
you can add the code at any time.</p>
|
||
|
|
||
|
<p>On Android, this is a bad idea. Virtual method calls are expensive,
|
||
|
much more so than instance field lookups. It's reasonable to follow
|
||
|
common object-oriented programming practices and have getters and setters
|
||
|
in the public interface, but within a class you should always access
|
||
|
fields directly.</p>
|
||
|
|
||
|
<p>Without a JIT, direct field access is about 3x faster than invoking a
|
||
|
trivial getter. With the JIT (where direct field access is as cheap as
|
||
|
accessing a local), direct field access is about 7x faster than invoking a
|
||
|
trivial getter. This is true in Froyo, but will improve in the future when
|
||
|
the JIT inlines getter methods.</p>
|
||
|
|
||
|
<a name="use_final" id="use_final"></a>
|
||
|
<h2>Use Static Final For Constants</h2>
|
||
|
|
||
|
<p>Consider the following declaration at the top of a class:</p>
|
||
|
|
||
|
<pre>static int intVal = 42;
|
||
|
static String strVal = "Hello, world!";</pre>
|
||
|
|
||
|
<p>The compiler generates a class initializer method, called
|
||
|
<code><clinit></code>, that is executed when the class is first used.
|
||
|
The method stores the value 42 into <code>intVal</code>, and extracts a
|
||
|
reference from the classfile string constant table for <code>strVal</code>.
|
||
|
When these values are referenced later on, they are accessed with field
|
||
|
lookups.</p>
|
||
|
|
||
|
<p>We can improve matters with the "final" keyword:</p>
|
||
|
|
||
|
<pre>static final int intVal = 42;
|
||
|
static final String strVal = "Hello, world!";</pre>
|
||
|
|
||
|
<p>The class no longer requires a <code><clinit></code> method,
|
||
|
because the constants go into static field initializers in the dex file.
|
||
|
Code that refers to <code>intVal</code> will use
|
||
|
the integer value 42 directly, and accesses to <code>strVal</code> will
|
||
|
use a relatively inexpensive "string constant" instruction instead of a
|
||
|
field lookup. (Note that this optimization only applies to primitive types and
|
||
|
<code>String</code> constants, not arbitrary reference types. Still, it's good
|
||
|
practice to declare constants <code>static final</code> whenever possible.)</p>
|
||
|
|
||
|
<a name="foreach" id="foreach"></a>
|
||
|
<h2>Use Enhanced For Loop Syntax</h2>
|
||
|
|
||
|
<p>The enhanced for loop (also sometimes known as "for-each" loop) can be used
|
||
|
for collections that implement the Iterable interface and for arrays.
|
||
|
With collections, an iterator is allocated to make interface calls
|
||
|
to hasNext() and next(). With an ArrayList, a hand-written counted loop is
|
||
|
about 3x faster (with or without JIT), but for other collections the enhanced
|
||
|
for loop syntax will be exactly equivalent to explicit iterator usage.</p>
|
||
|
|
||
|
<p>There are several alternatives for iterating through an array:</p>
|
||
|
|
||
|
<pre> static class Foo {
|
||
|
int mSplat;
|
||
|
}
|
||
|
Foo[] mArray = ...
|
||
|
|
||
|
public void zero() {
|
||
|
int sum = 0;
|
||
|
for (int i = 0; i < mArray.length; ++i) {
|
||
|
sum += mArray[i].mSplat;
|
||
|
}
|
||
|
}
|
||
|
|
||
|
public void one() {
|
||
|
int sum = 0;
|
||
|
Foo[] localArray = mArray;
|
||
|
int len = localArray.length;
|
||
|
|
||
|
for (int i = 0; i < len; ++i) {
|
||
|
sum += localArray[i].mSplat;
|
||
|
}
|
||
|
}
|
||
|
|
||
|
public void two() {
|
||
|
int sum = 0;
|
||
|
for (Foo a : mArray) {
|
||
|
sum += a.mSplat;
|
||
|
}
|
||
|
}
|
||
|
</pre>
|
||
|
|
||
|
<p><strong>zero()</strong> is slowest, because the JIT can't yet optimize away
|
||
|
the cost of getting the array length once for every iteration through the
|
||
|
loop.</p>
|
||
|
|
||
|
<p><strong>one()</strong> is faster. It pulls everything out into local
|
||
|
variables, avoiding the lookups. Only the array length offers a performance
|
||
|
benefit.</p>
|
||
|
|
||
|
<p><strong>two()</strong> is fastest for devices without a JIT, and
|
||
|
indistinguishable from <strong>one()</strong> for devices with a JIT.
|
||
|
It uses the enhanced for loop syntax introduced in version 1.5 of the Java
|
||
|
programming language.</p>
|
||
|
|
||
|
<p>To summarize: use the enhanced for loop by default, but consider a
|
||
|
hand-written counted loop for performance-critical ArrayList iteration.</p>
|
||
|
|
||
|
<p>(See also <em>Effective Java</em> item 46.)</p>
|
||
|
|
||
|
<a name="avoid_enums" id="avoid_enums"></a>
|
||
|
<h2>Avoid Enums Where You Only Need Ints</h2>
|
||
|
|
||
|
<p>Enums are very convenient, but unfortunately can be painful when size
|
||
|
and speed matter. For example, this:</p>
|
||
|
|
||
|
<pre>public enum Shrubbery { GROUND, CRAWLING, HANGING }</pre>
|
||
|
|
||
|
<p>adds 740 bytes to your .dex file compared to the equivalent class
|
||
|
with three public static final ints. On first use, the
|
||
|
class initializer invokes the <init> method on objects representing each
|
||
|
of the enumerated values. Each object gets its own static field, and the full
|
||
|
set is stored in an array (a static field called "$VALUES"). That's a lot of
|
||
|
code and data, just for three integers. Additionally, this:</p>
|
||
|
|
||
|
<pre>Shrubbery shrub = Shrubbery.GROUND;</pre>
|
||
|
|
||
|
<p>causes a static field lookup. If "GROUND" were a static final int,
|
||
|
the compiler would treat it as a known constant and inline it.</p>
|
||
|
|
||
|
<p>The flip side, of course, is that with enums you get nicer APIs and
|
||
|
some compile-time value checking. So, the usual trade-off applies: you should
|
||
|
by all means use enums for public APIs, but try to avoid them when performance
|
||
|
matters.</p>
|
||
|
|
||
|
<p>If you're using <code>Enum.ordinal</code>, that's usually a sign that you
|
||
|
should be using ints instead. As a rule of thumb, if an enum doesn't have a
|
||
|
constructor and doesn't define its own methods, and it's used in
|
||
|
performance-critical code, you should consider <code>static final int</code>
|
||
|
constants instead.</p>
|
||
|
|
||
|
<a name="package_inner" id="package_inner"></a>
|
||
|
<h2>Use Package Scope with Inner Classes</h2>
|
||
|
|
||
|
<p>Consider the following class definition:</p>
|
||
|
|
||
|
<pre>public class Foo {
|
||
|
private int mValue;
|
||
|
|
||
|
public void run() {
|
||
|
Inner in = new Inner();
|
||
|
mValue = 27;
|
||
|
in.stuff();
|
||
|
}
|
||
|
|
||
|
private void doStuff(int value) {
|
||
|
System.out.println("Value is " + value);
|
||
|
}
|
||
|
|
||
|
private class Inner {
|
||
|
void stuff() {
|
||
|
Foo.this.doStuff(Foo.this.mValue);
|
||
|
}
|
||
|
}
|
||
|
}</pre>
|
||
|
|
||
|
<p>The key things to note here are that we define an inner class (Foo$Inner)
|
||
|
that directly accesses a private method and a private instance field
|
||
|
in the outer class. This is legal, and the code prints "Value is 27" as
|
||
|
expected.</p>
|
||
|
|
||
|
<p>The problem is that the VM considers direct access to Foo's private members
|
||
|
from Foo$Inner to be illegal because Foo and Foo$Inner are different classes,
|
||
|
even though the Java language allows an inner class to access an outer class'
|
||
|
private members. To bridge the gap, the compiler generates a couple of
|
||
|
synthetic methods:</p>
|
||
|
|
||
|
<pre>/*package*/ static int Foo.access$100(Foo foo) {
|
||
|
return foo.mValue;
|
||
|
}
|
||
|
/*package*/ static void Foo.access$200(Foo foo, int value) {
|
||
|
foo.doStuff(value);
|
||
|
}</pre>
|
||
|
|
||
|
<p>The inner-class code calls these static methods whenever it needs to
|
||
|
access the "mValue" field or invoke the "doStuff" method in the outer
|
||
|
class. What this means is that the code above really boils down to a case
|
||
|
where you're accessing member fields through accessor methods instead of
|
||
|
directly. Earlier we talked about how accessors are slower than direct field
|
||
|
accesses, so this is an example of a certain language idiom resulting in an
|
||
|
"invisible" performance hit.</p>
|
||
|
|
||
|
<p>We can avoid this problem by declaring fields and methods accessed
|
||
|
by inner classes to have package scope, rather than private scope.
|
||
|
This runs faster and removes the overhead of the generated methods.
|
||
|
(Unfortunately it also means the fields could be accessed directly by other
|
||
|
classes in the same package, which runs counter to the standard
|
||
|
practice of making all fields private. Once again, if you're
|
||
|
designing a public API you might want to carefully consider using this
|
||
|
optimization.)</p>
|
||
|
|
||
|
<a name="avoidfloat" id="avoidfloat"></a>
|
||
|
<h2>Use Floating-Point Judiciously</h2>
|
||
|
|
||
|
<p>As a rule of thumb, floating-point is about 2x slower than integer on
|
||
|
Android devices. This is true on a FPU-less, JIT-less G1 and a Nexus One with
|
||
|
an FPU and the JIT. (Of course, absolute speed difference between those two
|
||
|
devices is about 10x for arithmetic operations.)</p>
|
||
|
|
||
|
<p>In speed terms, there's no difference between <code>float</code> and
|
||
|
<code>double</code> on the more modern hardware. Space-wise, <code>double</code>
|
||
|
is 2x larger. As with desktop machines, assuming space isn't an issue, you
|
||
|
should prefer <code>double</code> to <code>float</code>.</p>
|
||
|
|
||
|
<p>Also, even for integers, some chips have hardware multiply but lack
|
||
|
hardware divide. In such cases, integer division and modulus operations are
|
||
|
performed in software — something to think about if you're designing a
|
||
|
hash table or doing lots of math.</p>
|
||
|
|
||
|
<a name="library" id="library"></a>
|
||
|
<h2>Know And Use The Libraries</h2>
|
||
|
|
||
|
<p>In addition to all the usual reasons to prefer library code over rolling
|
||
|
your own, bear in mind that the system is at liberty to replace calls
|
||
|
to library methods with hand-coded assembler, which may be better than the
|
||
|
best code the JIT can produce for the equivalent Java. The typical example
|
||
|
here is <code>String.indexOf</code> and friends, which Dalvik replaces with
|
||
|
an inlined intrinsic. Similarly, the <code>System.arraycopy</code> method
|
||
|
is about 9x faster than a hand-coded loop on a Nexus One with the JIT.</p>
|
||
|
|
||
|
<p>(See also <em>Effective Java</em> item 47.)</p>
|
||
|
|
||
|
<a name="native_methods" id="native_methods"></a>
|
||
|
<h2>Use Native Methods Judiciously</h2>
|
||
|
|
||
|
<p>Native code isn't necessarily more efficient than Java. For one thing,
|
||
|
there's a cost associated with the Java-native transition, and the JIT can't
|
||
|
optimize across these boundaries. If you're allocating native resources (memory
|
||
|
on the native heap, file descriptors, or whatever), it can be significantly
|
||
|
more difficult to arrange timely collection of these resources. You also
|
||
|
need to compile your code for each architecture you wish to run on (rather
|
||
|
than rely on it having a JIT). You may even have to compile multiple versions
|
||
|
for what you consider the same architecture: native code compiled for the ARM
|
||
|
processor in the G1 can't take full advantage of the ARM in the Nexus One, and
|
||
|
code compiled for the ARM in the Nexus One won't run on the ARM in the G1.</p>
|
||
|
|
||
|
<p>Native code is primarily useful when you have an existing native codebase
|
||
|
that you want to port to Android, not for "speeding up" parts of a Java app.</p>
|
||
|
|
||
|
<p>(See also <em>Effective Java</em> item 54.)</p>
|
||
|
|
||
|
<a name="closing_notes" id="closing_notes"></a>
|
||
|
<h2>Closing Notes</h2>
|
||
|
|
||
|
<p>One last thing: always measure. Before you start optimizing, make sure you
|
||
|
have a problem. Make sure you can accurately measure your existing performance,
|
||
|
or you won't be able to measure the benefit of the alternatives you try.</p>
|
||
|
|
||
|
<p>Every claim made in this document is backed up by a benchmark. The source
|
||
|
to these benchmarks can be found in the <a href="http://code.google.com/p/dalvik/source/browse/#svn/trunk/benchmarks">code.google.com "dalvik" project</a>.</p>
|
||
|
|
||
|
<p>The benchmarks are built with the
|
||
|
<a href="http://code.google.com/p/caliper/">Caliper</a> microbenchmarking
|
||
|
framework for Java. Microbenchmarks are hard to get right, so Caliper goes out
|
||
|
of its way to do the hard work for you, and even detect some cases where you're
|
||
|
not measuring what you think you're measuring (because, say, the VM has
|
||
|
managed to optimize all your code away). We highly recommend you use Caliper
|
||
|
to run your own microbenchmarks.</p>
|
||
|
|
||
|
<p>You may also find
|
||
|
<a href="{@docRoot}guide/developing/tools/traceview.html">Traceview</a> useful
|
||
|
for profiling, but it's important to realize that it currently disables the JIT,
|
||
|
which may cause it to misattribute time to code that the JIT may be able to win
|
||
|
back. It's especially important after making changes suggested by Traceview
|
||
|
data to ensure that the resulting code actually runs faster when run without
|
||
|
Traceview.
|