tflite micro integrate repo

This commit is contained in:
QingChuanWS
2020-12-09 15:06:27 +08:00
parent 200c0ff460
commit d80d0af1a6
406 changed files with 58235 additions and 1908 deletions

View File

@@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -0,0 +1,398 @@
#ifndef FLATBUFFERS_BASE_H_
#define FLATBUFFERS_BASE_H_
// clang-format off
// If activate should be declared and included first.
#if defined(FLATBUFFERS_MEMORY_LEAK_TRACKING) && \
defined(_MSC_VER) && defined(_DEBUG)
// The _CRTDBG_MAP_ALLOC inside <crtdbg.h> will replace
// calloc/free (etc) to its debug version using #define directives.
#define _CRTDBG_MAP_ALLOC
#include <stdlib.h>
#include <crtdbg.h>
// Replace operator new by trace-enabled version.
#define DEBUG_NEW new(_NORMAL_BLOCK, __FILE__, __LINE__)
#define new DEBUG_NEW
#endif
#if !defined(FLATBUFFERS_ASSERT)
#include <assert.h>
#define FLATBUFFERS_ASSERT assert
#elif defined(FLATBUFFERS_ASSERT_INCLUDE)
// Include file with forward declaration
#include FLATBUFFERS_ASSERT_INCLUDE
#endif
#ifndef ARDUINO
#include <cstdint>
#endif
#include <cstddef>
#include <cstdlib>
#include <cstring>
#if defined(ARDUINO) && !defined(ARDUINOSTL_M_H)
#include <utility.h>
#else
#include <utility>
#endif
#include <string>
#include <type_traits>
#include <vector>
#include <set>
#include <algorithm>
#include <iterator>
#include <memory>
#ifdef _STLPORT_VERSION
#define FLATBUFFERS_CPP98_STL
#endif
#ifndef FLATBUFFERS_CPP98_STL
#include <functional>
#endif
#include "flatbuffers/stl_emulation.h"
#if defined(__ICCARM__)
#include <intrinsics.h>
#endif
// Note the __clang__ check is needed, because clang presents itself
// as an older GNUC compiler (4.2).
// Clang 3.3 and later implement all of the ISO C++ 2011 standard.
// Clang 3.4 and later implement all of the ISO C++ 2014 standard.
// http://clang.llvm.org/cxx_status.html
// Note the MSVC value '__cplusplus' may be incorrect:
// The '__cplusplus' predefined macro in the MSVC stuck at the value 199711L,
// indicating (erroneously!) that the compiler conformed to the C++98 Standard.
// This value should be correct starting from MSVC2017-15.7-Preview-3.
// The '__cplusplus' will be valid only if MSVC2017-15.7-P3 and the `/Zc:__cplusplus` switch is set.
// Workaround (for details see MSDN):
// Use the _MSC_VER and _MSVC_LANG definition instead of the __cplusplus for compatibility.
// The _MSVC_LANG macro reports the Standard version regardless of the '/Zc:__cplusplus' switch.
#if defined(__GNUC__) && !defined(__clang__)
#define FLATBUFFERS_GCC (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
#else
#define FLATBUFFERS_GCC 0
#endif
#if defined(__clang__)
#define FLATBUFFERS_CLANG (__clang_major__ * 10000 + __clang_minor__ * 100 + __clang_patchlevel__)
#else
#define FLATBUFFERS_CLANG 0
#endif
/// @cond FLATBUFFERS_INTERNAL
#if __cplusplus <= 199711L && \
(!defined(_MSC_VER) || _MSC_VER < 1600) && \
(!defined(__GNUC__) || \
(__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__ < 40400))
#error A C++11 compatible compiler with support for the auto typing is \
required for FlatBuffers.
#error __cplusplus _MSC_VER __GNUC__ __GNUC_MINOR__ __GNUC_PATCHLEVEL__
#endif
#if !defined(__clang__) && \
defined(__GNUC__) && \
(__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__ < 40600)
// Backwards compatibility for g++ 4.4, and 4.5 which don't have the nullptr
// and constexpr keywords. Note the __clang__ check is needed, because clang
// presents itself as an older GNUC compiler.
#ifndef nullptr_t
const class nullptr_t {
public:
template<class T> inline operator T*() const { return 0; }
private:
void operator&() const;
} nullptr = {};
#endif
#ifndef constexpr
#define constexpr const
#endif
#endif
// The wire format uses a little endian encoding (since that's efficient for
// the common platforms).
#if defined(__s390x__)
#define FLATBUFFERS_LITTLEENDIAN 0
#endif // __s390x__
#if !defined(FLATBUFFERS_LITTLEENDIAN)
#if defined(__GNUC__) || defined(__clang__) || defined(__ICCARM__)
#if (defined(__BIG_ENDIAN__) || \
(defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__))
#define FLATBUFFERS_LITTLEENDIAN 0
#else
#define FLATBUFFERS_LITTLEENDIAN 1
#endif // __BIG_ENDIAN__
#elif defined(_MSC_VER)
#if defined(_M_PPC)
#define FLATBUFFERS_LITTLEENDIAN 0
#else
#define FLATBUFFERS_LITTLEENDIAN 1
#endif
#else
#error Unable to determine endianness, define FLATBUFFERS_LITTLEENDIAN.
#endif
#endif // !defined(FLATBUFFERS_LITTLEENDIAN)
#define FLATBUFFERS_VERSION_MAJOR 1
#define FLATBUFFERS_VERSION_MINOR 12
#define FLATBUFFERS_VERSION_REVISION 0
#define FLATBUFFERS_STRING_EXPAND(X) #X
#define FLATBUFFERS_STRING(X) FLATBUFFERS_STRING_EXPAND(X)
namespace flatbuffers {
// Returns version as string "MAJOR.MINOR.REVISION".
const char* FLATBUFFERS_VERSION();
}
#if (!defined(_MSC_VER) || _MSC_VER > 1600) && \
(!defined(__GNUC__) || (__GNUC__ * 100 + __GNUC_MINOR__ >= 407)) || \
defined(__clang__)
#define FLATBUFFERS_FINAL_CLASS final
#define FLATBUFFERS_OVERRIDE override
#define FLATBUFFERS_VTABLE_UNDERLYING_TYPE : flatbuffers::voffset_t
#else
#define FLATBUFFERS_FINAL_CLASS
#define FLATBUFFERS_OVERRIDE
#define FLATBUFFERS_VTABLE_UNDERLYING_TYPE
#endif
#if (!defined(_MSC_VER) || _MSC_VER >= 1900) && \
(!defined(__GNUC__) || (__GNUC__ * 100 + __GNUC_MINOR__ >= 406)) || \
(defined(__cpp_constexpr) && __cpp_constexpr >= 200704)
#define FLATBUFFERS_CONSTEXPR constexpr
#else
#define FLATBUFFERS_CONSTEXPR const
#endif
#if (defined(__cplusplus) && __cplusplus >= 201402L) || \
(defined(__cpp_constexpr) && __cpp_constexpr >= 201304)
#define FLATBUFFERS_CONSTEXPR_CPP14 FLATBUFFERS_CONSTEXPR
#else
#define FLATBUFFERS_CONSTEXPR_CPP14
#endif
#if (defined(__GXX_EXPERIMENTAL_CXX0X__) && (__GNUC__ * 100 + __GNUC_MINOR__ >= 406)) || \
(defined(_MSC_FULL_VER) && (_MSC_FULL_VER >= 190023026)) || \
defined(__clang__)
#define FLATBUFFERS_NOEXCEPT noexcept
#else
#define FLATBUFFERS_NOEXCEPT
#endif
// NOTE: the FLATBUFFERS_DELETE_FUNC macro may change the access mode to
// private, so be sure to put it at the end or reset access mode explicitly.
#if (!defined(_MSC_VER) || _MSC_FULL_VER >= 180020827) && \
(!defined(__GNUC__) || (__GNUC__ * 100 + __GNUC_MINOR__ >= 404)) || \
defined(__clang__)
#define FLATBUFFERS_DELETE_FUNC(func) func = delete;
#else
#define FLATBUFFERS_DELETE_FUNC(func) private: func;
#endif
#ifndef FLATBUFFERS_HAS_STRING_VIEW
// Only provide flatbuffers::string_view if __has_include can be used
// to detect a header that provides an implementation
#if defined(__has_include)
// Check for std::string_view (in c++17)
#if __has_include(<string_view>) && (__cplusplus >= 201606 || (defined(_HAS_CXX17) && _HAS_CXX17))
#include <string_view>
namespace flatbuffers {
typedef std::string_view string_view;
}
#define FLATBUFFERS_HAS_STRING_VIEW 1
// Check for std::experimental::string_view (in c++14, compiler-dependent)
#elif __has_include(<experimental/string_view>) && (__cplusplus >= 201411)
#include <experimental/string_view>
namespace flatbuffers {
typedef std::experimental::string_view string_view;
}
#define FLATBUFFERS_HAS_STRING_VIEW 1
// Check for absl::string_view
#elif __has_include("absl/strings/string_view.h")
#include "absl/strings/string_view.h"
namespace flatbuffers {
typedef absl::string_view string_view;
}
#define FLATBUFFERS_HAS_STRING_VIEW 1
#endif
#endif // __has_include
#endif // !FLATBUFFERS_HAS_STRING_VIEW
#ifndef FLATBUFFERS_HAS_NEW_STRTOD
// Modern (C++11) strtod and strtof functions are available for use.
// 1) nan/inf strings as argument of strtod;
// 2) hex-float as argument of strtod/strtof.
#if (defined(_MSC_VER) && _MSC_VER >= 1900) || \
(defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__ >= 409)) || \
(defined(__clang__))
#define FLATBUFFERS_HAS_NEW_STRTOD 1
#endif
#endif // !FLATBUFFERS_HAS_NEW_STRTOD
#ifndef FLATBUFFERS_LOCALE_INDEPENDENT
// Enable locale independent functions {strtof_l, strtod_l,strtoll_l, strtoull_l}.
// They are part of the POSIX-2008 but not part of the C/C++ standard.
// GCC/Clang have definition (_XOPEN_SOURCE>=700) if POSIX-2008.
#if ((defined(_MSC_VER) && _MSC_VER >= 1800) || \
(defined(_XOPEN_SOURCE) && (_XOPEN_SOURCE>=700)))
#define FLATBUFFERS_LOCALE_INDEPENDENT 1
#else
#define FLATBUFFERS_LOCALE_INDEPENDENT 0
#endif
#endif // !FLATBUFFERS_LOCALE_INDEPENDENT
// Suppress Undefined Behavior Sanitizer (recoverable only). Usage:
// - __supress_ubsan__("undefined")
// - __supress_ubsan__("signed-integer-overflow")
#if defined(__clang__) && (__clang_major__ > 3 || (__clang_major__ == 3 && __clang_minor__ >=7))
#define __supress_ubsan__(type) __attribute__((no_sanitize(type)))
#elif defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__ >= 409)
#define __supress_ubsan__(type) __attribute__((no_sanitize_undefined))
#else
#define __supress_ubsan__(type)
#endif
// This is constexpr function used for checking compile-time constants.
// Avoid `#pragma warning(disable: 4127) // C4127: expression is constant`.
template<typename T> FLATBUFFERS_CONSTEXPR inline bool IsConstTrue(T t) {
return !!t;
}
// Enable C++ attribute [[]] if std:c++17 or higher.
#if ((__cplusplus >= 201703L) \
|| (defined(_MSVC_LANG) && (_MSVC_LANG >= 201703L)))
// All attributes unknown to an implementation are ignored without causing an error.
#define FLATBUFFERS_ATTRIBUTE(attr) [[attr]]
#define FLATBUFFERS_FALLTHROUGH() [[fallthrough]]
#else
#define FLATBUFFERS_ATTRIBUTE(attr)
#if FLATBUFFERS_CLANG >= 30800
#define FLATBUFFERS_FALLTHROUGH() [[clang::fallthrough]]
#elif FLATBUFFERS_GCC >= 70300
#define FLATBUFFERS_FALLTHROUGH() [[gnu::fallthrough]]
#else
#define FLATBUFFERS_FALLTHROUGH()
#endif
#endif
/// @endcond
/// @file
namespace flatbuffers {
/// @cond FLATBUFFERS_INTERNAL
// Our default offset / size type, 32bit on purpose on 64bit systems.
// Also, using a consistent offset type maintains compatibility of serialized
// offset values between 32bit and 64bit systems.
typedef uint32_t uoffset_t;
// Signed offsets for references that can go in both directions.
typedef int32_t soffset_t;
// Offset/index used in v-tables, can be changed to uint8_t in
// format forks to save a bit of space if desired.
typedef uint16_t voffset_t;
typedef uintmax_t largest_scalar_t;
// In 32bits, this evaluates to 2GB - 1
#define FLATBUFFERS_MAX_BUFFER_SIZE ((1ULL << (sizeof(::flatbuffers::soffset_t) * 8 - 1)) - 1)
// We support aligning the contents of buffers up to this size.
#define FLATBUFFERS_MAX_ALIGNMENT 16
#if defined(_MSC_VER)
#pragma warning(push)
#pragma warning(disable: 4127) // C4127: conditional expression is constant
#endif
template<typename T> T EndianSwap(T t) {
#if defined(_MSC_VER)
#define FLATBUFFERS_BYTESWAP16 _byteswap_ushort
#define FLATBUFFERS_BYTESWAP32 _byteswap_ulong
#define FLATBUFFERS_BYTESWAP64 _byteswap_uint64
#elif defined(__ICCARM__)
#define FLATBUFFERS_BYTESWAP16 __REV16
#define FLATBUFFERS_BYTESWAP32 __REV
#define FLATBUFFERS_BYTESWAP64(x) \
((__REV(static_cast<uint32_t>(x >> 32U))) | (static_cast<uint64_t>(__REV(static_cast<uint32_t>(x)))) << 32U)
#else
#if defined(__GNUC__) && __GNUC__ * 100 + __GNUC_MINOR__ < 408 && !defined(__clang__)
// __builtin_bswap16 was missing prior to GCC 4.8.
#define FLATBUFFERS_BYTESWAP16(x) \
static_cast<uint16_t>(__builtin_bswap32(static_cast<uint32_t>(x) << 16))
#else
#define FLATBUFFERS_BYTESWAP16 __builtin_bswap16
#endif
#define FLATBUFFERS_BYTESWAP32 __builtin_bswap32
#define FLATBUFFERS_BYTESWAP64 __builtin_bswap64
#endif
if (sizeof(T) == 1) { // Compile-time if-then's.
return t;
} else if (sizeof(T) == 2) {
union { T t; uint16_t i; } u = { t };
u.i = FLATBUFFERS_BYTESWAP16(u.i);
return u.t;
} else if (sizeof(T) == 4) {
union { T t; uint32_t i; } u = { t };
u.i = FLATBUFFERS_BYTESWAP32(u.i);
return u.t;
} else if (sizeof(T) == 8) {
union { T t; uint64_t i; } u = { t };
u.i = FLATBUFFERS_BYTESWAP64(u.i);
return u.t;
} else {
FLATBUFFERS_ASSERT(0);
return t;
}
}
#if defined(_MSC_VER)
#pragma warning(pop)
#endif
template<typename T> T EndianScalar(T t) {
#if FLATBUFFERS_LITTLEENDIAN
return t;
#else
return EndianSwap(t);
#endif
}
template<typename T>
// UBSAN: C++ aliasing type rules, see std::bit_cast<> for details.
__supress_ubsan__("alignment")
T ReadScalar(const void *p) {
return EndianScalar(*reinterpret_cast<const T *>(p));
}
template<typename T>
// UBSAN: C++ aliasing type rules, see std::bit_cast<> for details.
__supress_ubsan__("alignment")
void WriteScalar(void *p, T t) {
*reinterpret_cast<T *>(p) = EndianScalar(t);
}
template<typename T> struct Offset;
template<typename T> __supress_ubsan__("alignment") void WriteScalar(void *p, Offset<T> t) {
*reinterpret_cast<uoffset_t *>(p) = EndianScalar(t.o);
}
// Computes how many bytes you'd have to pad to be able to write an
// "scalar_size" scalar if the buffer had grown to "buf_size" (downwards in
// memory).
__supress_ubsan__("unsigned-integer-overflow")
inline size_t PaddingBytes(size_t buf_size, size_t scalar_size) {
return ((~buf_size) + 1) & (scalar_size - 1);
}
} // namespace flatbuffers
#endif // FLATBUFFERS_BASE_H_

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,307 @@
/*
* Copyright 2017 Google Inc. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef FLATBUFFERS_STL_EMULATION_H_
#define FLATBUFFERS_STL_EMULATION_H_
// clang-format off
#include <string>
#include <type_traits>
#include <vector>
#include <memory>
#include <limits>
#if defined(_STLPORT_VERSION) && !defined(FLATBUFFERS_CPP98_STL)
#define FLATBUFFERS_CPP98_STL
#endif // defined(_STLPORT_VERSION) && !defined(FLATBUFFERS_CPP98_STL)
#if defined(FLATBUFFERS_CPP98_STL)
#include <cctype>
#endif // defined(FLATBUFFERS_CPP98_STL)
// Check if we can use template aliases
// Not possible if Microsoft Compiler before 2012
// Possible is the language feature __cpp_alias_templates is defined well
// Or possible if the C++ std is C+11 or newer
#if (defined(_MSC_VER) && _MSC_VER > 1700 /* MSVC2012 */) \
|| (defined(__cpp_alias_templates) && __cpp_alias_templates >= 200704) \
|| (defined(__cplusplus) && __cplusplus >= 201103L)
#define FLATBUFFERS_TEMPLATES_ALIASES
#endif
// This header provides backwards compatibility for C++98 STLs like stlport.
namespace flatbuffers {
// Retrieve ::back() from a string in a way that is compatible with pre C++11
// STLs (e.g stlport).
inline char& string_back(std::string &value) {
return value[value.length() - 1];
}
inline char string_back(const std::string &value) {
return value[value.length() - 1];
}
// Helper method that retrieves ::data() from a vector in a way that is
// compatible with pre C++11 STLs (e.g stlport).
template <typename T> inline T *vector_data(std::vector<T> &vector) {
// In some debug environments, operator[] does bounds checking, so &vector[0]
// can't be used.
return vector.empty() ? nullptr : &vector[0];
}
template <typename T> inline const T *vector_data(
const std::vector<T> &vector) {
return vector.empty() ? nullptr : &vector[0];
}
template <typename T, typename V>
inline void vector_emplace_back(std::vector<T> *vector, V &&data) {
#if defined(FLATBUFFERS_CPP98_STL)
vector->push_back(data);
#else
vector->emplace_back(std::forward<V>(data));
#endif // defined(FLATBUFFERS_CPP98_STL)
}
#ifndef FLATBUFFERS_CPP98_STL
#if defined(FLATBUFFERS_TEMPLATES_ALIASES)
template <typename T>
using numeric_limits = std::numeric_limits<T>;
#else
template <typename T> class numeric_limits :
public std::numeric_limits<T> {};
#endif // defined(FLATBUFFERS_TEMPLATES_ALIASES)
#else
template <typename T> class numeric_limits :
public std::numeric_limits<T> {
public:
// Android NDK fix.
static T lowest() {
return std::numeric_limits<T>::min();
}
};
template <> class numeric_limits<float> :
public std::numeric_limits<float> {
public:
static float lowest() { return -FLT_MAX; }
};
template <> class numeric_limits<double> :
public std::numeric_limits<double> {
public:
static double lowest() { return -DBL_MAX; }
};
template <> class numeric_limits<unsigned long long> {
public:
static unsigned long long min() { return 0ULL; }
static unsigned long long max() { return ~0ULL; }
static unsigned long long lowest() {
return numeric_limits<unsigned long long>::min();
}
};
template <> class numeric_limits<long long> {
public:
static long long min() {
return static_cast<long long>(1ULL << ((sizeof(long long) << 3) - 1));
}
static long long max() {
return static_cast<long long>(
(1ULL << ((sizeof(long long) << 3) - 1)) - 1);
}
static long long lowest() {
return numeric_limits<long long>::min();
}
};
#endif // FLATBUFFERS_CPP98_STL
#if defined(FLATBUFFERS_TEMPLATES_ALIASES)
#ifndef FLATBUFFERS_CPP98_STL
template <typename T> using is_scalar = std::is_scalar<T>;
template <typename T, typename U> using is_same = std::is_same<T,U>;
template <typename T> using is_floating_point = std::is_floating_point<T>;
template <typename T> using is_unsigned = std::is_unsigned<T>;
template <typename T> using is_enum = std::is_enum<T>;
template <typename T> using make_unsigned = std::make_unsigned<T>;
template<bool B, class T, class F>
using conditional = std::conditional<B, T, F>;
template<class T, T v>
using integral_constant = std::integral_constant<T, v>;
#else
// Map C++ TR1 templates defined by stlport.
template <typename T> using is_scalar = std::tr1::is_scalar<T>;
template <typename T, typename U> using is_same = std::tr1::is_same<T,U>;
template <typename T> using is_floating_point =
std::tr1::is_floating_point<T>;
template <typename T> using is_unsigned = std::tr1::is_unsigned<T>;
template <typename T> using is_enum = std::tr1::is_enum<T>;
// Android NDK doesn't have std::make_unsigned or std::tr1::make_unsigned.
template<typename T> struct make_unsigned {
static_assert(is_unsigned<T>::value, "Specialization not implemented!");
using type = T;
};
template<> struct make_unsigned<char> { using type = unsigned char; };
template<> struct make_unsigned<short> { using type = unsigned short; };
template<> struct make_unsigned<int> { using type = unsigned int; };
template<> struct make_unsigned<long> { using type = unsigned long; };
template<>
struct make_unsigned<long long> { using type = unsigned long long; };
template<bool B, class T, class F>
using conditional = std::tr1::conditional<B, T, F>;
template<class T, T v>
using integral_constant = std::tr1::integral_constant<T, v>;
#endif // !FLATBUFFERS_CPP98_STL
#else
// MSVC 2010 doesn't support C++11 aliases.
template <typename T> struct is_scalar : public std::is_scalar<T> {};
template <typename T, typename U> struct is_same : public std::is_same<T,U> {};
template <typename T> struct is_floating_point :
public std::is_floating_point<T> {};
template <typename T> struct is_unsigned : public std::is_unsigned<T> {};
template <typename T> struct is_enum : public std::is_enum<T> {};
template <typename T> struct make_unsigned : public std::make_unsigned<T> {};
template<bool B, class T, class F>
struct conditional : public std::conditional<B, T, F> {};
template<class T, T v>
struct integral_constant : public std::integral_constant<T, v> {};
#endif // defined(FLATBUFFERS_TEMPLATES_ALIASES)
#ifndef FLATBUFFERS_CPP98_STL
#if defined(FLATBUFFERS_TEMPLATES_ALIASES)
template <class T> using unique_ptr = std::unique_ptr<T>;
#else
// MSVC 2010 doesn't support C++11 aliases.
// We're manually "aliasing" the class here as we want to bring unique_ptr
// into the flatbuffers namespace. We have unique_ptr in the flatbuffers
// namespace we have a completely independent implemenation (see below)
// for C++98 STL implementations.
template <class T> class unique_ptr : public std::unique_ptr<T> {
public:
unique_ptr() {}
explicit unique_ptr(T* p) : std::unique_ptr<T>(p) {}
unique_ptr(std::unique_ptr<T>&& u) { *this = std::move(u); }
unique_ptr(unique_ptr&& u) { *this = std::move(u); }
unique_ptr& operator=(std::unique_ptr<T>&& u) {
std::unique_ptr<T>::reset(u.release());
return *this;
}
unique_ptr& operator=(unique_ptr&& u) {
std::unique_ptr<T>::reset(u.release());
return *this;
}
unique_ptr& operator=(T* p) {
return std::unique_ptr<T>::operator=(p);
}
};
#endif // defined(FLATBUFFERS_TEMPLATES_ALIASES)
#else
// Very limited implementation of unique_ptr.
// This is provided simply to allow the C++ code generated from the default
// settings to function in C++98 environments with no modifications.
template <class T> class unique_ptr {
public:
typedef T element_type;
unique_ptr() : ptr_(nullptr) {}
explicit unique_ptr(T* p) : ptr_(p) {}
unique_ptr(unique_ptr&& u) : ptr_(nullptr) { reset(u.release()); }
unique_ptr(const unique_ptr& u) : ptr_(nullptr) {
reset(const_cast<unique_ptr*>(&u)->release());
}
~unique_ptr() { reset(); }
unique_ptr& operator=(const unique_ptr& u) {
reset(const_cast<unique_ptr*>(&u)->release());
return *this;
}
unique_ptr& operator=(unique_ptr&& u) {
reset(u.release());
return *this;
}
unique_ptr& operator=(T* p) {
reset(p);
return *this;
}
const T& operator*() const { return *ptr_; }
T* operator->() const { return ptr_; }
T* get() const noexcept { return ptr_; }
explicit operator bool() const { return ptr_ != nullptr; }
// modifiers
T* release() {
T* value = ptr_;
ptr_ = nullptr;
return value;
}
void reset(T* p = nullptr) {
T* value = ptr_;
ptr_ = p;
if (value) delete value;
}
void swap(unique_ptr& u) {
T* temp_ptr = ptr_;
ptr_ = u.ptr_;
u.ptr_ = temp_ptr;
}
private:
T* ptr_;
};
template <class T> bool operator==(const unique_ptr<T>& x,
const unique_ptr<T>& y) {
return x.get() == y.get();
}
template <class T, class D> bool operator==(const unique_ptr<T>& x,
const D* y) {
return static_cast<D*>(x.get()) == y;
}
template <class T> bool operator==(const unique_ptr<T>& x, intptr_t y) {
return reinterpret_cast<intptr_t>(x.get()) == y;
}
template <class T> bool operator!=(const unique_ptr<T>& x, decltype(nullptr)) {
return !!x;
}
template <class T> bool operator!=(decltype(nullptr), const unique_ptr<T>& x) {
return !!x;
}
template <class T> bool operator==(const unique_ptr<T>& x, decltype(nullptr)) {
return !x;
}
template <class T> bool operator==(decltype(nullptr), const unique_ptr<T>& x) {
return !x;
}
#endif // !FLATBUFFERS_CPP98_STL
} // namespace flatbuffers
#endif // FLATBUFFERS_STL_EMULATION_H_

View File

@@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -0,0 +1,900 @@
// Copyright 2015 The Gemmlowp Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// fixedpoint.h: fixed-point arithmetic, with basic operations and
// a few math functions such as tanh.
#ifndef GEMMLOWP_INTERNAL_FIXEDPOINT_H_
#define GEMMLOWP_INTERNAL_FIXEDPOINT_H_
#include <algorithm>
#include <cassert>
#include <cmath>
#include <cstdint>
#include <limits>
#include "../internal/detect_platform.h"
namespace gemmlowp {
// Part 1: Low-level integer-arithmetic primitives.
// The implementations here are generic implementations valid for
// scalar types (e.g. std::int32_t). Architecture-specific SIMD types
// (e.g. NEON int32x4_t) may be supported by providing
// specializations for them in separate files.
//
// The purpose of these primitives is two-fold:
// - They will be used to implement higher-level fixed-point
// abstractions, namely the FixedPoint class and its arithmetic
// operators.
// - They will be directly used to implement some more involved
// fixed-point computations, e.g. the fixed-point implementation
// of math functions such as tanh.
// Some compile-time traits around raw types to handle SIMD aspects:
// number of lanes, underlying scalar type.
template <typename tIntegerType>
struct FixedPointRawTypeTraits {};
template <>
struct FixedPointRawTypeTraits<std::int32_t> {
typedef std::int32_t ScalarRawType;
static constexpr int kLanes = 1;
};
template <>
struct FixedPointRawTypeTraits<std::int16_t> {
typedef std::int16_t ScalarRawType;
static constexpr int kLanes = 1;
};
// Returns a SIMD value duplicating a scalar value across all lanes.
template <typename tRawType>
tRawType Dup(typename FixedPointRawTypeTraits<tRawType>::ScalarRawType x) {
return x;
}
// Plain bit-wise AND
template <typename tIntegerType>
tIntegerType BitAnd(tIntegerType a, tIntegerType b) {
return a & b;
}
// Plain bit-wise OR
template <typename tIntegerType>
tIntegerType BitOr(tIntegerType a, tIntegerType b) {
return a | b;
}
// Plain bit-wise XOR
template <typename tIntegerType>
tIntegerType BitXor(tIntegerType a, tIntegerType b) {
return a ^ b;
}
// Plain bit-wise NOT
template <typename tIntegerType>
tIntegerType BitNot(tIntegerType a) {
return ~a;
}
// Integer addition. Not saturating. Overflow is undefined behavior.
template <typename tIntegerType>
tIntegerType Add(tIntegerType a, tIntegerType b) {
return a + b;
}
// Integer subtraction. Not saturating. Overflow is undefined behavior.
template <typename tIntegerType>
tIntegerType Mul(tIntegerType a, tIntegerType b) {
return a * b;
}
template <typename tIntegerType>
tIntegerType Sub(tIntegerType a, tIntegerType b) {
return a - b;
}
// Integer unary negative. Not saturating. Overflow is undefined behavior.
template <typename tIntegerType>
tIntegerType Neg(tIntegerType a) {
return -a;
}
// Integer arithmetic left-shift, equivalent to multiplying with a power of two.
// Negative values are OK. In case of overflow, no Undefined
// Behavior, but the results are implementation-defined (in practice,
// they currently are saturated, but we make no commitment to that). The idea
// is that the caller will want to implement the overflowing cases with
// saturation with compare-and-mask, so we don't care about the results
// in the overflow case, we just want to avoid undefined behavior.
//
// tIntegerType may be int32 or any narrower signed type.
template <typename tIntegerType>
tIntegerType ShiftLeft(tIntegerType a, int offset) {
const std::int64_t wide_a = static_cast<std::int64_t>(a);
const std::int64_t wide_shifted = wide_a * (1 << offset);
const auto min = std::numeric_limits<tIntegerType>::min();
const auto max = std::numeric_limits<tIntegerType>::max();
return wide_shifted < min
? min
: wide_shifted > max ? max
: static_cast<tIntegerType>(wide_shifted);
}
// Integer arithmetic right-shift. Not rounding.
// Relying on implementation-defined, but in-practice-consistent,
// C++ compiler behavior.
template <typename tIntegerType>
tIntegerType ShiftRight(tIntegerType a, int offset) {
return a >> offset;
}
// Each bit of the result is set to the corresponding bit of either then_val or
// else_val depending on whether the corresponding bit of if_mask is set.
// Equivalent to the VBSL instruction in ARM NEON.
template <typename tIntegerType>
tIntegerType SelectUsingMask(tIntegerType if_mask, tIntegerType then_val,
tIntegerType else_val) {
return BitXor(BitAnd(if_mask, then_val), BitAnd(BitNot(if_mask), else_val));
}
// For each input scalar, the corresponding bits of the result are set if the
// input scalar is non-zero.
template <typename tIntegerType>
tIntegerType MaskIfNonZero(tIntegerType a) {
static constexpr tIntegerType zero = 0;
return a ? BitNot(zero) : zero;
}
// For each input scalar, the corresponding bits of the result are set if the
// input scalar is zero.
template <typename tIntegerType>
tIntegerType MaskIfZero(tIntegerType a) {
return MaskIfNonZero<tIntegerType>(!a);
}
// For each pair of input scalars, the corresponding bits of the result are
// set if the input scalars are equal.
template <typename tIntegerType>
tIntegerType MaskIfEqual(tIntegerType a, tIntegerType b) {
return MaskIfNonZero<tIntegerType>(a == b);
}
// For each pair of input scalars, the corresponding bits of the result are
// set if the input scalars are not equal.
template <typename tIntegerType>
tIntegerType MaskIfNotEqual(tIntegerType a, tIntegerType b) {
return MaskIfNonZero<tIntegerType>(a != b);
}
// For each pair of input scalars, the corresponding bits of the result are
// set if the input scalars a, b satisfy a > b.
template <typename tIntegerType>
tIntegerType MaskIfGreaterThan(tIntegerType a, tIntegerType b) {
return MaskIfNonZero<tIntegerType>(a > b);
}
// For each pair of input scalars, the corresponding bits of the result are
// set if the input scalars a, b satisfy a >= b.
template <typename tIntegerType>
tIntegerType MaskIfGreaterThanOrEqual(tIntegerType a, tIntegerType b) {
return MaskIfNonZero<tIntegerType>(a >= b);
}
// For each pair of input scalars, the corresponding bits of the result are
// set if the input scalars a, b satisfy a < b.
template <typename tIntegerType>
tIntegerType MaskIfLessThan(tIntegerType a, tIntegerType b) {
return MaskIfNonZero<tIntegerType>(a < b);
}
// For each pair of input scalars, the corresponding bits of the result are
// set if the input scalars a, b satisfy a <= b.
template <typename tIntegerType>
tIntegerType MaskIfLessThanOrEqual(tIntegerType a, tIntegerType b) {
return MaskIfNonZero<tIntegerType>(a <= b);
}
// Returns true if all of the input scalars are nonzero.
// This function may currently assume that each of the input scalars has either
// all or none of its bits set. Otherwise, its behavior is currently undefined.
template <typename tIntegerType>
bool All(tIntegerType a) {
return a;
}
// Returns true if any of the input scalars are nonzero.
// This function may currently assume that each of the input scalars has either
// all or none of its bits set. Otherwise, its behavior is currently undefined.
template <typename tIntegerType>
bool Any(tIntegerType a) {
return a;
}
// Returns (a+b)/2, rounded to the nearest integer.
// Equivalent to VRHADD in the ARM NEON instruction set.
template <typename IntegerType>
IntegerType RoundingHalfSum(IntegerType a, IntegerType b) {
static_assert(std::is_same<IntegerType, void>::value, "unimplemented");
(void)b;
return a;
}
template <>
inline std::int32_t RoundingHalfSum(std::int32_t a, std::int32_t b) {
std::int64_t a64 = a;
std::int64_t b64 = b;
std::int64_t sum = a64 + b64;
std::int64_t sign = sum >= 0 ? 1 : -1;
return static_cast<std::int32_t>((sum + sign) / 2);
}
template <>
inline std::int16_t RoundingHalfSum(std::int16_t a, std::int16_t b) {
std::int32_t a32 = a;
std::int32_t b32 = b;
std::int32_t sum = a32 + b32;
std::int32_t sign = sum >= 0 ? 1 : -1;
return static_cast<std::int16_t>((sum + sign) / 2);
}
template <typename IntegerType>
IntegerType SaturatingAdd(IntegerType a, IntegerType b) {
static_assert(std::is_same<IntegerType, void>::value, "unimplemented");
(void)b;
return a;
}
// So far this is only needed for int16.
template <>
inline std::int16_t SaturatingAdd(std::int16_t a, std::int16_t b) {
std::int32_t a32 = a;
std::int32_t b32 = b;
std::int32_t sum = a32 + b32;
return static_cast<std::int16_t>(
std::min(static_cast<std::int32_t>(32767),
std::max(static_cast<std::int32_t>(-32768), sum)));
}
// Returns a+b, saturating if the integers are 16bit or narrower,
// otherwise just a plain addition.
template <typename IntegerType, bool Is16Bit>
struct AddSaturatingIf16BitImpl {
static IntegerType Run(IntegerType a, IntegerType b) { return Add(a, b); }
};
template <typename IntegerType>
struct AddSaturatingIf16BitImpl<IntegerType, true> {
static IntegerType Run(IntegerType a, IntegerType b) {
return SaturatingAdd(a, b);
}
};
template <typename IntegerType>
IntegerType AddSaturatingIf16Bit(IntegerType a, IntegerType b) {
using ScalarType =
typename FixedPointRawTypeTraits<IntegerType>::ScalarRawType;
return AddSaturatingIf16BitImpl<IntegerType, sizeof(ScalarType) == 2>::Run(a,
b);
}
// Returns the integer that represents the product of two fixed-point
// numbers, interpreting all integers as fixed-point values in the
// interval [-1, 1), rounding to the nearest value, and saturating
// -1 * -1 to the maximum value (since 1 is not in the half-open
// interval [-1, 1)).
//
// [The explanation below specializes to std::int32_t for example purpose.]
//
// The mapping between IntegerType and the interval [-1, 1) is unique and
// implied by IntegerType, which is assumed to be signed. For example,
// for IntegerType==std::int32_t, the mapping is
// real_value = integer_value / 2^31.
// So in this case, and leaving aside rounding and saturating, this
// function computes ((a / 2^31) * (b / 2^31)) * 2^31, which simplifies to
// (a * b) / 2^31.
//
// The 'doubling' part in the name of this function comes from the fact that
// this operation is very close to a "multiply-high" operation, keeping only
// the top half bits, except that that would be effectively computing
// (a * b) / 2^32,
// so here we are computing 2x that, since
// 1/2^31 = 2 * 1/2^32.
// The idea is to use all of the available 32 bits in the destination int32
// value.
//
// [End of the explanation specializing to int32.]
//
// This is equivalent to the VQRDMULH instruction in ARM NEON.
template <typename IntegerType>
IntegerType SaturatingRoundingDoublingHighMul(IntegerType a, IntegerType b) {
static_assert(std::is_same<IntegerType, void>::value, "unimplemented");
(void)b;
return a;
}
// This function implements the same computation as the ARMv7 NEON VQRDMULH
// instruction.
template <>
inline std::int32_t SaturatingRoundingDoublingHighMul(std::int32_t a,
std::int32_t b) {
bool overflow = a == b && a == std::numeric_limits<std::int32_t>::min();
std::int64_t a_64(a);
std::int64_t b_64(b);
std::int64_t ab_64 = a_64 * b_64;
std::int32_t nudge = ab_64 >= 0 ? (1 << 30) : (1 - (1 << 30));
std::int32_t ab_x2_high32 =
static_cast<std::int32_t>((ab_64 + nudge) / (1ll << 31));
return overflow ? std::numeric_limits<std::int32_t>::max() : ab_x2_high32;
}
template <>
inline std::int16_t SaturatingRoundingDoublingHighMul(std::int16_t a,
std::int16_t b) {
bool overflow = a == b && a == std::numeric_limits<std::int16_t>::min();
std::int32_t a_32(a);
std::int32_t b_32(b);
std::int32_t ab_32 = a_32 * b_32;
std::int16_t nudge = ab_32 >= 0 ? (1 << 14) : (1 - (1 << 14));
std::int16_t ab_x2_high16 =
static_cast<std::int16_t>((ab_32 + nudge) / (1 << 15));
return overflow ? std::numeric_limits<std::int16_t>::max() : ab_x2_high16;
}
// Correctly-rounded-to-nearest division by a power-of-two.
// Also known as a rounding arithmetic right shift.
template <typename IntegerType>
inline IntegerType RoundingDivideByPOT(IntegerType x, int exponent) {
assert(exponent >= 0);
assert(exponent <= 31);
const IntegerType mask = Dup<IntegerType>((1ll << exponent) - 1);
const IntegerType zero = Dup<IntegerType>(0);
const IntegerType one = Dup<IntegerType>(1);
const IntegerType remainder = BitAnd(x, mask);
const IntegerType threshold =
Add(ShiftRight(mask, 1), BitAnd(MaskIfLessThan(x, zero), one));
return Add(ShiftRight(x, exponent),
BitAnd(MaskIfGreaterThan(remainder, threshold), one));
}
// Returns the product of a run-time integer value by a compile-time power
// of two, with either a positive exponent (equivalent to an arithmetic
// left shift, saturating) or a negative exponent (equivalent to an arithmetic
// right shift, rounding to nearest).
template <int Exponent, typename IntegerType,
int ExponentSign = (Exponent > 0 ? 1 : Exponent < 0 ? -1 : 0)>
struct ImplSaturatingRoundingMultiplyByPOT {};
template <int Exponent, typename IntegerType>
struct ImplSaturatingRoundingMultiplyByPOT<Exponent, IntegerType, 0> {
static IntegerType eval(IntegerType x) { return x; }
};
template <int Exponent, typename IntegerType>
struct ImplSaturatingRoundingMultiplyByPOT<Exponent, IntegerType, 1> {
static IntegerType eval(IntegerType x) {
using ScalarIntegerType =
typename FixedPointRawTypeTraits<IntegerType>::ScalarRawType;
const IntegerType min =
Dup<IntegerType>(std::numeric_limits<ScalarIntegerType>::min());
const IntegerType max =
Dup<IntegerType>(std::numeric_limits<ScalarIntegerType>::max());
const int ScalarIntegerTypeBits = 8 * sizeof(ScalarIntegerType);
const std::int32_t threshold =
((1 << (ScalarIntegerTypeBits - 1 - Exponent)) - 1);
const IntegerType positive_mask =
MaskIfGreaterThan(x, Dup<IntegerType>(threshold));
const IntegerType negative_mask =
MaskIfLessThan(x, Dup<IntegerType>(-threshold));
IntegerType result = ShiftLeft(x, Exponent);
result = SelectUsingMask(positive_mask, max, result);
result = SelectUsingMask(negative_mask, min, result);
return result;
}
};
template <int Exponent, typename IntegerType>
struct ImplSaturatingRoundingMultiplyByPOT<Exponent, IntegerType, -1> {
static IntegerType eval(IntegerType x) {
return RoundingDivideByPOT<IntegerType>(x, -Exponent);
}
};
template <int Exponent, typename IntegerType>
IntegerType SaturatingRoundingMultiplyByPOT(IntegerType x) {
return ImplSaturatingRoundingMultiplyByPOT<Exponent, IntegerType>::eval(x);
}
// Part 2: the FixedPoint class.
// A FixedPoint object represents a fixed-point value stored in the underlying
// integer type tRawType, if tRawType is a plain scalar integer type.
// Alternatively, tRawType may be a SIMD type (e.g. NEON int32x4_t) in which
// case a FixedPoint object represents a corresponding SIMD vector of fixed
// point values.
//
// tIntegerBits describes the range of the fixed-point format: if
// tIntegerBits == m then the range of representable values is the half-open
// interval [-2^m; 2^m) where the open boundary on the right side means that
// 2^m is not representable (how close the maximum representable value is to
// it, depends on bit-depth of tRawType).
//
// In "Q format notation",
// https://en.wikipedia.org/wiki/Q_(number_format)
// we are describing the format
// Qm.n
// where
// m = tIntegerBits
// and
// n = NumberOfBits(tRawType) - (m + 1)
// Note that the (m + 1) in the above line is because we adopt the convention
// that we count the integer bits exclusively of the sign bit; so (m + 1) is
// the total number of integer bits inclusive of the sign bit.
//
// Accordingly, the number of integral representable values in our range
// [-2^m ; 2^m)
// is equal to 2^(m+1).
template <typename tRawType, int tIntegerBits>
class FixedPoint {
public:
typedef tRawType RawType;
typedef FixedPointRawTypeTraits<RawType> RawTypeTraits;
typedef typename RawTypeTraits::ScalarRawType ScalarRawType;
static constexpr int kTotalBits = 8 * sizeof(ScalarRawType);
static constexpr int kIntegerBits = tIntegerBits;
static constexpr int kFractionalBits = kTotalBits - 1 - kIntegerBits;
static_assert(kIntegerBits >= 0 && kIntegerBits < kTotalBits,
"bad IntegerBits");
typedef FixedPoint<ScalarRawType, kIntegerBits> ScalarFixedPointType;
static const ScalarRawType ScalarRawMin() {
return std::numeric_limits<ScalarRawType>::min();
}
static const ScalarRawType ScalarRawMax() {
return std::numeric_limits<ScalarRawType>::max();
}
static const ScalarRawType RawMin() {
return VectorFromScalar(ScalarRawMin());
}
static const ScalarRawType RawMax() {
return VectorFromScalar(ScalarRawMax());
}
static FixedPoint FromRaw(RawType x) {
FixedPoint retval;
retval.raw() = x;
return retval;
}
static FixedPoint FromScalarRaw(ScalarRawType x) {
FixedPoint retval;
retval.raw() = Dup<RawType>(x);
return retval;
}
static FixedPoint FromScalarFixedPoint(ScalarFixedPointType x) {
return FromScalarRaw(x.raw());
}
template <int Exponent>
static FixedPoint ConstantPOT() {
static constexpr int kOffset = kFractionalBits + Exponent;
static_assert(
kOffset < 31,
"Constant not exactly representable in this fixed-point format");
return FromScalarRaw(ScalarRawType(1) << kOffset);
}
static FixedPoint Zero() { return FromScalarRaw(0); }
static FixedPoint One() {
return FromScalarRaw(
kIntegerBits == 0
? ScalarRawMax()
: (ScalarRawType(1) << (kIntegerBits == 0 ? 0 : kFractionalBits)));
}
static FixedPoint FromDouble(double x) {
const double min_bound = static_cast<double>(ScalarRawMin());
const double max_bound = static_cast<double>(ScalarRawMax());
return FromScalarRaw(static_cast<ScalarRawType>(std::min(
std::max(round(x * static_cast<double>(1ll << kFractionalBits)),
min_bound),
max_bound)));
}
RawType raw() const { return i_; }
RawType& raw() { return i_; }
private:
RawType i_;
};
// Part 3: implementation of arithmetic operators for the
// FixedPoint class, and a few related functions.
// A FixedPoint multiplication is just a
// SaturatingRoundingDoublingHighMul operation on the underlying
// raw integer values. The IntegerBits simply add up, as is obvious
// from the fact that the range is [-2^IntegerBits, 2^IntegerBits).
template <typename tRawType, int tIntegerBits_a, int tIntegerBits_b>
FixedPoint<tRawType, tIntegerBits_a + tIntegerBits_b> operator*(
FixedPoint<tRawType, tIntegerBits_a> a,
FixedPoint<tRawType, tIntegerBits_b> b) {
FixedPoint<tRawType, tIntegerBits_a + tIntegerBits_b> c;
c.raw() = SaturatingRoundingDoublingHighMul(a.raw(), b.raw());
return c;
}
// Tweaking IntegerBits gives exact multiplication by a power of two.
template <int tExponent, typename tRawType, int tIntegerBits>
FixedPoint<tRawType, tExponent + tIntegerBits> ExactMulByPot(
FixedPoint<tRawType, tIntegerBits> a) {
FixedPoint<tRawType, tExponent + tIntegerBits> c;
c.raw() = a.raw();
return c;
}
// If we want to leave IntegerBits fixed, then multiplication
// by a power of two has to be saturating/rounding, not exact anymore.
template <int tExponent, typename tRawType, int tIntegerBits>
FixedPoint<tRawType, tIntegerBits> SaturatingRoundingMultiplyByPOT(
FixedPoint<tRawType, tIntegerBits> a) {
return FixedPoint<tRawType, tIntegerBits>::FromRaw(
SaturatingRoundingMultiplyByPOT<tExponent>(a.raw()));
}
// Generic arithmetic operators.
#define MAKE_FIXEDPOINT_UNARY_FUNC(FuncName, ImplFuncName) \
template <typename tRawType, int tIntegerBits> \
FixedPoint<tRawType, tIntegerBits> FuncName( \
FixedPoint<tRawType, tIntegerBits> a) { \
return FixedPoint<tRawType, tIntegerBits>::FromRaw(ImplFuncName(a.raw())); \
}
#define MAKE_FIXEDPOINT_BINARY_FUNC(FuncName, ImplFuncName) \
template <typename tRawType, int tIntegerBits> \
FixedPoint<tRawType, tIntegerBits> FuncName( \
FixedPoint<tRawType, tIntegerBits> a, \
FixedPoint<tRawType, tIntegerBits> b) { \
return FixedPoint<tRawType, tIntegerBits>::FromRaw( \
ImplFuncName(a.raw(), b.raw())); \
}
MAKE_FIXEDPOINT_UNARY_FUNC(operator-, Neg)
MAKE_FIXEDPOINT_UNARY_FUNC(operator~, BitNot)
MAKE_FIXEDPOINT_BINARY_FUNC(operator+, Add)
MAKE_FIXEDPOINT_BINARY_FUNC(operator-, Sub)
MAKE_FIXEDPOINT_BINARY_FUNC(operator&, BitAnd)
MAKE_FIXEDPOINT_BINARY_FUNC(operator^, BitXor)
MAKE_FIXEDPOINT_BINARY_FUNC(operator|, BitOr)
MAKE_FIXEDPOINT_BINARY_FUNC(RoundingHalfSum, RoundingHalfSum)
#undef MAKE_FIXEDPOINT_UNARY_FUNC
#undef MAKE_FIXEDPOINT_BINARY_FUNC
#define MAKE_FIXEDPOINT_UNARY_FUNC_RETURNING_RAW(FuncName) \
template <typename tRawType, int tIntegerBits> \
tRawType FuncName(FixedPoint<tRawType, tIntegerBits> a) { \
return FuncName(a.raw()); \
}
#define MAKE_FIXEDPOINT_BINARY_FUNC_RETURNING_RAW(FuncName) \
template <typename tRawType, int tIntegerBits> \
tRawType FuncName(FixedPoint<tRawType, tIntegerBits> a, \
FixedPoint<tRawType, tIntegerBits> b) { \
return FuncName(a.raw(), b.raw()); \
}
MAKE_FIXEDPOINT_UNARY_FUNC_RETURNING_RAW(MaskIfZero)
MAKE_FIXEDPOINT_UNARY_FUNC_RETURNING_RAW(MaskIfNonZero)
MAKE_FIXEDPOINT_BINARY_FUNC_RETURNING_RAW(MaskIfEqual)
MAKE_FIXEDPOINT_BINARY_FUNC_RETURNING_RAW(MaskIfNotEqual)
MAKE_FIXEDPOINT_BINARY_FUNC_RETURNING_RAW(MaskIfGreaterThan)
MAKE_FIXEDPOINT_BINARY_FUNC_RETURNING_RAW(MaskIfGreaterThanOrEqual)
MAKE_FIXEDPOINT_BINARY_FUNC_RETURNING_RAW(MaskIfLessThan)
MAKE_FIXEDPOINT_BINARY_FUNC_RETURNING_RAW(MaskIfLessThanOrEqual)
#undef MAKE_FIXEDPOINT_UNARY_FUNC_RETURNING_RAW
#undef MAKE_FIXEDPOINT_BINARY_FUNC_RETURNING_RAW
template <typename tRawType, int tIntegerBits>
FixedPoint<tRawType, tIntegerBits> SelectUsingMask(
tRawType if_mask, FixedPoint<tRawType, tIntegerBits> then_val,
FixedPoint<tRawType, tIntegerBits> else_val) {
return FixedPoint<tRawType, tIntegerBits>::FromRaw(
SelectUsingMask(if_mask, then_val.raw(), else_val.raw()));
}
template <typename tRawType, int tIntegerBits>
bool operator==(FixedPoint<tRawType, tIntegerBits> a,
FixedPoint<tRawType, tIntegerBits> b) {
return All(MaskIfEqual(a.raw(), b.raw()));
}
template <typename tRawType, int tIntegerBits>
bool operator!=(FixedPoint<tRawType, tIntegerBits> a,
FixedPoint<tRawType, tIntegerBits> b) {
return !(a == b);
}
template <typename tRawType, int tIntegerBits>
FixedPoint<tRawType, tIntegerBits> SaturatingAdd(
FixedPoint<tRawType, tIntegerBits> a,
FixedPoint<tRawType, tIntegerBits> b) {
return FixedPoint<tRawType, tIntegerBits>::FromRaw(
SaturatingAdd(a.raw(), b.raw()));
}
template <typename tRawType, int tIntegerBits>
FixedPoint<tRawType, tIntegerBits> AddSaturatingIf16Bit(
FixedPoint<tRawType, tIntegerBits> a,
FixedPoint<tRawType, tIntegerBits> b) {
return FixedPoint<tRawType, tIntegerBits>::FromRaw(
AddSaturatingIf16Bit(a.raw(), b.raw()));
}
// Conversion to floating-point.
template <typename tRawType, int tIntegerBits>
double ToDouble(FixedPoint<tRawType, tIntegerBits> x) {
static_assert(FixedPointRawTypeTraits<tRawType>::kLanes == 1,
"not applicable to SIMD types");
typedef FixedPoint<tRawType, tIntegerBits> F;
return x.raw() / static_cast<double>(1ll << F::kFractionalBits);
}
// Rescale changes the number of IntegerBits and updates the underlying
// raw integer value accordingly.
template <int tIntegerBitsDst, typename tRawType, int tIntegerBitsSrc>
FixedPoint<tRawType, tIntegerBitsDst> Rescale(
FixedPoint<tRawType, tIntegerBitsSrc> x) {
static constexpr int kExponent = tIntegerBitsSrc - tIntegerBitsDst;
FixedPoint<tRawType, tIntegerBitsDst> result;
result.raw() = SaturatingRoundingMultiplyByPOT<kExponent>(x.raw());
return result;
}
// CheckedFixedPointConstant allows to specify fixed-point constants
// initialized as real numbers, in a way that does not compile floating-point
// arithmetic in production code, yet still checks agreement with the
// floating-point expressions when asserts are enabled.
//
// The raw integer value provided is always a int32, encoding a 32-bit
// fixed-point value, regardless of the actual Scalar type. This allows
// writing generic code that applies just as well to the 32-bit and 16-bit
// cases. In the 16-bit case, the raw integer value is internally
// rounding-shifted by 16 bits to the right.
template <typename FixedPointType>
inline typename FixedPointType::ScalarRawType RescaleConstantInitializer(
std::int32_t int32_value) {
typedef typename FixedPointType::ScalarRawType ScalarRawType;
static constexpr int ScalarTypeBits = 8 * sizeof(ScalarRawType);
return static_cast<ScalarRawType>(
RoundingDivideByPOT<std::int32_t>(int32_value, 32 - ScalarTypeBits));
}
#ifdef GEMMLOWP_ENABLE_FIXEDPOINT_CONSTANTS_CHECKS
template <typename FixedPointType>
FixedPointType CheckedFixedPointConstant(std::int32_t raw_value,
double double_value) {
const FixedPointType result = FixedPointType::FromScalarRaw(raw_value);
assert(result == FixedPointType::FromDouble(double_value));
return result;
}
#define GEMMLOWP_CHECKED_FIXEDPOINT_CONSTANT(FixedPointType, \
ScalarRawInt32Value, DoubleValue) \
(gemmlowp::CheckedFixedPointConstant<FixedPointType>( \
gemmlowp::RescaleConstantInitializer<FixedPointType>( \
ScalarRawInt32Value), \
DoubleValue))
#else
#define GEMMLOWP_CHECKED_FIXEDPOINT_CONSTANT(FixedPointType, \
ScalarRawInt32Value, DoubleValue) \
(FixedPointType::FromScalarRaw( \
gemmlowp::RescaleConstantInitializer<FixedPointType>( \
ScalarRawInt32Value)))
#endif
// Implementation of exponential function.
// Returns exp(x) for x in [-1/4, 0).
template <typename tRawType>
FixedPoint<tRawType, 0> exp_on_interval_between_negative_one_quarter_and_0_excl(
FixedPoint<tRawType, 0> a) {
typedef FixedPoint<tRawType, 0> F;
const F constant_term =
GEMMLOWP_CHECKED_FIXEDPOINT_CONSTANT(F, 1895147668, std::exp(-1.0 / 8.0));
const F constant_1_over_3 =
GEMMLOWP_CHECKED_FIXEDPOINT_CONSTANT(F, 715827883, 1.0 / 3.0);
// We're evaluating a Taylor expansion around -1/8, so we do the change of
// variable: x = a + 1/8.
// In fixed-point with 0 integer bits, 1/8 is represented by 1 << 28.
F x = a + F::template ConstantPOT<-3>();
F x2 = x * x;
F x3 = x2 * x;
F x4 = x2 * x2;
F x4_over_4 = SaturatingRoundingMultiplyByPOT<-2>(x4);
F x4_over_24_plus_x3_over_6_plus_x2_over_2 =
SaturatingRoundingMultiplyByPOT<-1>(
((x4_over_4 + x3) * constant_1_over_3) + x2);
return AddSaturatingIf16Bit(
constant_term,
constant_term * (x + x4_over_24_plus_x3_over_6_plus_x2_over_2));
}
// Returns exp(x) for x < 0.
template <typename tRawType, int tIntegerBits>
FixedPoint<tRawType, 0> exp_on_negative_values(
FixedPoint<tRawType, tIntegerBits> a) {
typedef FixedPoint<tRawType, tIntegerBits> InputF;
typedef FixedPoint<tRawType, 0> ResultF;
static constexpr int kFractionalBits = InputF::kFractionalBits;
static constexpr int kIntegerBits = InputF::kIntegerBits;
const InputF kOneQuarter = InputF::template ConstantPOT<-2>();
InputF mask = kOneQuarter - InputF::FromScalarRaw(1);
InputF a_mod_quarter_minus_one_quarter = (a & mask) - kOneQuarter;
ResultF result = exp_on_interval_between_negative_one_quarter_and_0_excl(
Rescale<0>(a_mod_quarter_minus_one_quarter));
tRawType remainder = (a_mod_quarter_minus_one_quarter - a).raw();
#define GEMMLOWP_EXP_BARREL_SHIFTER(Exponent, FixedPointMultiplier) \
if (kIntegerBits > Exponent) { \
const ResultF kMultiplier = GEMMLOWP_CHECKED_FIXEDPOINT_CONSTANT( \
ResultF, FixedPointMultiplier, std::exp(-std::pow(2.0, Exponent))); \
static constexpr int kShiftAmount = \
kIntegerBits > Exponent ? kFractionalBits + Exponent : 0; \
result = SelectUsingMask( \
MaskIfNonZero(BitAnd(remainder, Dup<tRawType>(1 << kShiftAmount))), \
result * kMultiplier, result); \
}
GEMMLOWP_EXP_BARREL_SHIFTER(-2, 1672461947);
GEMMLOWP_EXP_BARREL_SHIFTER(-1, 1302514674);
GEMMLOWP_EXP_BARREL_SHIFTER(+0, 790015084);
GEMMLOWP_EXP_BARREL_SHIFTER(+1, 290630308);
GEMMLOWP_EXP_BARREL_SHIFTER(+2, 39332535);
GEMMLOWP_EXP_BARREL_SHIFTER(+3, 720401);
GEMMLOWP_EXP_BARREL_SHIFTER(+4, 242);
#undef GEMMLOWP_EXP_BARREL_SHIFTER
static constexpr int clampB = kIntegerBits > 5 ? 36 - kIntegerBits : 0;
if (kIntegerBits > 5) {
const InputF clamp =
GEMMLOWP_CHECKED_FIXEDPOINT_CONSTANT(InputF, -(1 << clampB), -32.0);
result = SelectUsingMask(MaskIfLessThan(a, clamp), ResultF::Zero(), result);
}
result = SelectUsingMask(MaskIfZero(a), ResultF::One(), result);
return result;
}
// Implementation of tanh: (1 - exp(-2x)) / (1 + exp(-2x)).
// Returns (1 - x) / (1 + x) for x in (0, 1).
template <typename tRawType>
FixedPoint<tRawType, 0> one_minus_x_over_one_plus_x_for_x_in_0_1(
FixedPoint<tRawType, 0> a) {
typedef FixedPoint<tRawType, 0> F0;
typedef FixedPoint<tRawType, 2> F2;
F0 half_denominator = RoundingHalfSum(a, F0::One());
// Newton-Raphson division
// https://en.wikipedia.org/wiki/Division_algorithm#Newton.E2.80.93Raphson_division
// Refer to that page for the logic behind the 48/17 and 32/17 constants.
const F2 constant_48_over_17 =
GEMMLOWP_CHECKED_FIXEDPOINT_CONSTANT(F2, 1515870810, 48.0 / 17.0);
const F2 constant_neg_32_over_17 =
GEMMLOWP_CHECKED_FIXEDPOINT_CONSTANT(F2, -1010580540, -32.0 / 17.0);
F2 x = constant_48_over_17 + half_denominator * constant_neg_32_over_17;
for (int i = 0; i < 3; i++) {
F2 half_denominator_times_x = half_denominator * x;
F2 one_minus_half_denominator_times_x =
F2::One() - half_denominator_times_x;
x = x + Rescale<2>(x * one_minus_half_denominator_times_x);
}
return Rescale<0>(x - F2::One());
}
// Returns -tanh(x) for x < 0.
template <typename tRawType, int tIntegerBits>
FixedPoint<tRawType, 0> neg_tanh_on_negative_values(
FixedPoint<tRawType, tIntegerBits> a) {
return one_minus_x_over_one_plus_x_for_x_in_0_1(
exp_on_negative_values(ExactMulByPot<1>(a)));
}
// Returns tanh(x) for any x.
template <typename tRawType, int tIntegerBits>
FixedPoint<tRawType, 0> tanh(FixedPoint<tRawType, tIntegerBits> a) {
typedef FixedPoint<tRawType, tIntegerBits> InputF;
typedef FixedPoint<tRawType, 0> ResultF;
tRawType mask_if_negative = MaskIfLessThan(a, InputF::Zero());
tRawType mask_if_zero = MaskIfZero(a);
InputF n = SelectUsingMask(mask_if_negative, a, -a);
ResultF t = neg_tanh_on_negative_values(n);
return SelectUsingMask(mask_if_zero, ResultF::Zero(),
SelectUsingMask(mask_if_negative, -t, t));
}
// Implementation of logistic function.
// Returns 1 / (1 + x) for x in (0, 1).
template <typename tRawType>
FixedPoint<tRawType, 0> one_over_one_plus_x_for_x_in_0_1(
FixedPoint<tRawType, 0> a) {
typedef FixedPoint<tRawType, 0> F0;
typedef FixedPoint<tRawType, 2> F2;
F0 half_denominator = RoundingHalfSum(a, F0::One());
// Newton-Raphson division
// https://en.wikipedia.org/wiki/Division_algorithm#Newton.E2.80.93Raphson_division
// Refer to that page for the logic behind the 48/17 and 32/17 constants.
const F2 constant_48_over_17 =
GEMMLOWP_CHECKED_FIXEDPOINT_CONSTANT(F2, 1515870810, 48.0 / 17.0);
const F2 constant_neg_32_over_17 =
GEMMLOWP_CHECKED_FIXEDPOINT_CONSTANT(F2, -1010580540, -32.0 / 17.0);
F2 x = constant_48_over_17 + half_denominator * constant_neg_32_over_17;
for (int i = 0; i < 3; i++) {
F2 half_denominator_times_x = half_denominator * x;
F2 one_minus_half_denominator_times_x =
F2::One() - half_denominator_times_x;
x = x + Rescale<2>(x * one_minus_half_denominator_times_x);
}
return Rescale<0>(ExactMulByPot<-1>(x));
}
// Returns logistic(x) = 1 / (1 + exp(-x)) for x > 0.
template <typename tRawType, int tIntegerBits>
FixedPoint<tRawType, 0> logistic_on_positive_values(
FixedPoint<tRawType, tIntegerBits> a) {
return one_over_one_plus_x_for_x_in_0_1(exp_on_negative_values(-a));
}
// Returns logistic(x) = 1 / (1 + exp(-x)) for any x.
template <typename tRawType, int tIntegerBits>
FixedPoint<tRawType, 0> logistic(FixedPoint<tRawType, tIntegerBits> a) {
typedef FixedPoint<tRawType, tIntegerBits> InputF;
typedef FixedPoint<tRawType, 0> ResultF;
tRawType mask_if_positive = MaskIfGreaterThan(a, InputF::Zero());
tRawType mask_if_zero = MaskIfZero(a);
InputF abs_input = SelectUsingMask(mask_if_positive, a, -a);
ResultF result_if_positive = logistic_on_positive_values(abs_input);
ResultF result_if_negative = ResultF::One() - result_if_positive;
const ResultF one_half =
GEMMLOWP_CHECKED_FIXEDPOINT_CONSTANT(ResultF, 1 << 30, 0.5);
return SelectUsingMask(mask_if_zero, one_half,
SelectUsingMask(mask_if_positive, result_if_positive,
result_if_negative));
}
} // end namespace gemmlowp
#ifdef GEMMLOWP_NEON
#include "./fixedpoint_neon.h"
#elif defined(GEMMLOWP_AVX2)
#include "./fixedpoint_avx.h"
#elif defined(GEMMLOWP_SSE4)
#include "./fixedpoint_sse.h"
#elif defined(GEMMLOWP_MSA)
#include "./fixedpoint_msa.h"
#endif
#endif // GEMMLOWP_INTERNAL_FIXEDPOINT_H_

View File

@@ -0,0 +1,384 @@
// Copyright 2015 Google Inc. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// fixedpoint_SSE.h: optimized SSE specializations of the templates
// in fixedpoint.h.
#ifndef GEMMLOWP_INTERNAL_FIXEDPOINT_SSE_H_
#define GEMMLOWP_INTERNAL_FIXEDPOINT_SSE_H_
#include <smmintrin.h>
#include "fixedpoint.h"
namespace gemmlowp {
// SSE intrinsics are not finely typed: there is a single __m128i vector
// type that does not distinguish between "int32x4" and "int16x8" use
// cases, unlike the NEON equivalents. Because we had initially focused
// on int32x4, we did not pay attention and specialized these fixedpoint
// templates directly for __m128i hardcoding the int32x4 semantics,
// not leaving room for int16x8 semantics. Amending that by adding a separate
// data type, int16x8_m128i, that wraps __m128i while being a separate
// type.
struct int16x8_m128i {
int16x8_m128i() {}
explicit int16x8_m128i(__m128i w) : v(w) {}
~int16x8_m128i() {}
__m128i v;
};
template <>
struct FixedPointRawTypeTraits<__m128i> {
typedef std::int32_t ScalarRawType;
static constexpr int kLanes = 4;
};
template <>
struct FixedPointRawTypeTraits<int16x8_m128i> {
typedef std::int16_t ScalarRawType;
static constexpr int kLanes = 8;
};
template <>
inline __m128i BitAnd(__m128i a, __m128i b) {
return _mm_and_si128(a, b);
}
template <>
inline int16x8_m128i BitAnd(int16x8_m128i a, int16x8_m128i b) {
return int16x8_m128i(_mm_and_si128(a.v, b.v));
}
template <>
inline __m128i BitOr(__m128i a, __m128i b) {
return _mm_or_si128(a, b);
}
template <>
inline int16x8_m128i BitOr(int16x8_m128i a, int16x8_m128i b) {
return int16x8_m128i(_mm_or_si128(a.v, b.v));
}
template <>
inline __m128i BitXor(__m128i a, __m128i b) {
return _mm_xor_si128(a, b);
}
template <>
inline int16x8_m128i BitXor(int16x8_m128i a, int16x8_m128i b) {
return int16x8_m128i(_mm_xor_si128(a.v, b.v));
}
template <>
inline __m128i BitNot(__m128i a) {
return _mm_andnot_si128(a, _mm_set1_epi32(-1));
}
template <>
inline int16x8_m128i BitNot(int16x8_m128i a) {
return int16x8_m128i(_mm_andnot_si128(a.v, _mm_set1_epi16(-1)));
}
template <>
inline __m128i Add(__m128i a, __m128i b) {
return _mm_add_epi32(a, b);
}
template <>
inline int16x8_m128i Add(int16x8_m128i a, int16x8_m128i b) {
return int16x8_m128i(_mm_add_epi16(a.v, b.v));
}
template <>
inline __m128i Mul(__m128i a, __m128i b) {
return _mm_mullo_epi32(a, b);
}
template <>
inline int16x8_m128i Mul(int16x8_m128i a, int16x8_m128i b) {
return int16x8_m128i(_mm_mullo_epi16(a.v, b.v));
}
template <>
inline __m128i Sub(__m128i a, __m128i b) {
return _mm_sub_epi32(a, b);
}
template <>
inline int16x8_m128i Sub(int16x8_m128i a, int16x8_m128i b) {
return int16x8_m128i(_mm_sub_epi16(a.v, b.v));
}
template <>
inline __m128i Neg(__m128i a) {
return _mm_sign_epi32(a, _mm_set1_epi32(-1));
}
template <>
inline int16x8_m128i Neg(int16x8_m128i a) {
return int16x8_m128i(_mm_sign_epi16(a.v, _mm_set1_epi16(-1)));
}
template <>
inline __m128i ShiftLeft(__m128i a, int offset) {
return _mm_slli_epi32(a, offset);
}
template <>
inline int16x8_m128i ShiftLeft(int16x8_m128i a, int offset) {
return int16x8_m128i(_mm_slli_epi16(a.v, offset));
}
template <>
inline __m128i ShiftRight(__m128i a, int offset) {
return _mm_srai_epi32(a, offset);
}
template <>
inline int16x8_m128i ShiftRight(int16x8_m128i a, int offset) {
return int16x8_m128i(_mm_srai_epi16(a.v, offset));
}
template <>
inline __m128i SelectUsingMask(__m128i if_mask, __m128i then_val,
__m128i else_val) {
// borrowed from Intel's arm_neon_sse.h header.
return _mm_or_si128(_mm_and_si128(if_mask, then_val),
_mm_andnot_si128(if_mask, else_val));
}
template <>
inline int16x8_m128i SelectUsingMask(int16x8_m128i if_mask,
int16x8_m128i then_val,
int16x8_m128i else_val) {
// borrowed from Intel's arm_neon_sse.h header.
return int16x8_m128i(SelectUsingMask(if_mask.v, then_val.v, else_val.v));
}
template <>
inline __m128i MaskIfEqual(__m128i a, __m128i b) {
return _mm_cmpeq_epi32(a, b);
}
template <>
inline int16x8_m128i MaskIfEqual(int16x8_m128i a, int16x8_m128i b) {
return int16x8_m128i(_mm_cmpeq_epi16(a.v, b.v));
}
template <>
inline __m128i MaskIfNotEqual(__m128i a, __m128i b) {
return BitNot(MaskIfEqual(a, b));
}
template <>
inline int16x8_m128i MaskIfNotEqual(int16x8_m128i a, int16x8_m128i b) {
return BitNot(MaskIfEqual(a, b));
}
template <>
inline __m128i MaskIfZero(__m128i a) {
return MaskIfEqual(a, _mm_set1_epi32(0));
}
template <>
inline int16x8_m128i MaskIfZero(int16x8_m128i a) {
return MaskIfEqual(a, int16x8_m128i(_mm_set1_epi16(0)));
}
template <>
inline __m128i MaskIfNonZero(__m128i a) {
return MaskIfNotEqual(a, _mm_set1_epi32(0));
}
template <>
inline int16x8_m128i MaskIfNonZero(int16x8_m128i a) {
return MaskIfNotEqual(a, int16x8_m128i(_mm_set1_epi16(0)));
}
template <>
inline __m128i MaskIfGreaterThan(__m128i a, __m128i b) {
return _mm_cmpgt_epi32(a, b);
}
template <>
inline int16x8_m128i MaskIfGreaterThan(int16x8_m128i a, int16x8_m128i b) {
return int16x8_m128i(_mm_cmpgt_epi16(a.v, b.v));
}
template <>
inline __m128i MaskIfLessThan(__m128i a, __m128i b) {
return _mm_cmplt_epi32(a, b);
}
template <>
inline int16x8_m128i MaskIfLessThan(int16x8_m128i a, int16x8_m128i b) {
return int16x8_m128i(_mm_cmplt_epi16(a.v, b.v));
}
template <>
inline __m128i MaskIfGreaterThanOrEqual(__m128i a, __m128i b) {
return BitNot(MaskIfLessThan(a, b));
}
template <>
inline int16x8_m128i MaskIfGreaterThanOrEqual(int16x8_m128i a,
int16x8_m128i b) {
return BitNot(MaskIfLessThan(a, b));
}
template <>
inline __m128i MaskIfLessThanOrEqual(__m128i a, __m128i b) {
return BitNot(MaskIfGreaterThan(a, b));
}
template <>
inline int16x8_m128i MaskIfLessThanOrEqual(int16x8_m128i a, int16x8_m128i b) {
return BitNot(MaskIfGreaterThan(a, b));
}
/* Assumptions:
- All and Any are used on masks.
- masks are all_ones for true lanes, all_zeroes otherwise.
Hence, All means all 128bits set, and Any means any bit set.
*/
template <>
inline bool All(__m128i a) {
return _mm_testc_si128(a, a);
}
template <>
inline bool All(int16x8_m128i a) {
return _mm_testc_si128(a.v, a.v);
}
template <>
inline bool Any(__m128i a) {
return !_mm_testz_si128(a, a);
}
template <>
inline bool Any(int16x8_m128i a) {
return !_mm_testz_si128(a.v, a.v);
}
template <>
inline __m128i RoundingHalfSum(__m128i a, __m128i b) {
/* __m128i round_bit_mask, a_over_2, b_over_2, round_bit, sum; */
/* We divide the inputs before the add to avoid the overflow and costly test
*/
/* of checking if an overflow occured on signed add */
/* round_bit_mask = _mm_set1_epi32(1); */
/* a_over_2 = _mm_srai_epi32(a, 1); */
/* b_over_2 = _mm_srai_epi32(b, 1); */
/* sum = Add(a_over_2, b_over_2); */
/* round_bit = _mm_sign_epi32(BitAnd(BitOr(a,b), round_bit_mask), sum); */
/* return Add(sum, round_bit); */
/* Other possibility detecting overflow and xor the sign if an overflow
* happened*/
__m128i one, sign_bit_mask, sum, rounded_half_sum, overflow, result;
one = _mm_set1_epi32(1);
sign_bit_mask = _mm_set1_epi32(0x80000000);
sum = Add(a, b);
rounded_half_sum = _mm_srai_epi32(Add(sum, one), 1);
overflow =
BitAnd(BitAnd(BitXor(a, rounded_half_sum), BitXor(b, rounded_half_sum)),
sign_bit_mask);
result = BitXor(rounded_half_sum, overflow);
return result;
}
template <>
inline int16x8_m128i RoundingHalfSum(int16x8_m128i a, int16x8_m128i b) {
// Idea: go to unsigned to use _mm_avg_epu16,
// borrowed from Intel's arm_neon_sse.h header.
__m128i constant_neg_32768 = _mm_set1_epi16(-32768);
__m128i a_unsigned = _mm_sub_epi16(a.v, constant_neg_32768);
__m128i b_unsigned = _mm_sub_epi16(b.v, constant_neg_32768);
__m128i avg_unsigned = _mm_avg_epu16(a_unsigned, b_unsigned);
__m128i avg = _mm_add_epi16(avg_unsigned, constant_neg_32768);
return int16x8_m128i(avg);
}
template <>
inline __m128i SaturatingRoundingDoublingHighMul(__m128i a, __m128i b) {
__m128i min, saturation_mask, a0_a2, a1_a3, b0_b2, b1_b3;
__m128i a0b0_a2b2, a1b1_a3b3, a0b0_a2b2_rounded, a1b1_a3b3_rounded;
__m128i a0b0_a2b2_rounded_2x, a1b1_a3b3_rounded_2x, result;
__m128i nudge;
// saturation only happen if a == b == INT_MIN
min = _mm_set1_epi32(std::numeric_limits<std::int32_t>::min());
saturation_mask = BitAnd(MaskIfEqual(a, b), MaskIfEqual(a, min));
// a = a0 | a1 | a2 | a3
// b = b0 | b1 | b2 | b3
a0_a2 = a;
a1_a3 = _mm_srli_si128(a, 4);
b0_b2 = b;
b1_b3 = _mm_srli_si128(b, 4);
a0b0_a2b2 = _mm_mul_epi32(a0_a2, b0_b2);
a1b1_a3b3 = _mm_mul_epi32(a1_a3, b1_b3);
// do the rounding and take into account that it will be doubled
nudge = _mm_set1_epi64x(1 << 30);
a0b0_a2b2_rounded = _mm_add_epi64(a0b0_a2b2, nudge);
a1b1_a3b3_rounded = _mm_add_epi64(a1b1_a3b3, nudge);
// do the doubling
a0b0_a2b2_rounded_2x = _mm_slli_epi64(a0b0_a2b2_rounded, 1);
a1b1_a3b3_rounded_2x = _mm_slli_epi64(a1b1_a3b3_rounded, 1);
// get the high part of the products
result = _mm_blend_epi16(_mm_srli_si128(a0b0_a2b2_rounded_2x, 4),
a1b1_a3b3_rounded_2x, 0xcc);
// saturate those which overflowed
return SelectUsingMask(saturation_mask, min, result);
}
template <>
inline int16x8_m128i SaturatingRoundingDoublingHighMul(int16x8_m128i a,
int16x8_m128i b) {
// Idea: use _mm_mulhrs_epi16 then saturate with a bit-operation,
// borrowed from Intel's arm_neon_sse.h header.
__m128i result_unsaturated = _mm_mulhrs_epi16(a.v, b.v);
__m128i saturation_mask =
_mm_cmpeq_epi16(result_unsaturated, _mm_set1_epi16(0x8000));
__m128i result = _mm_xor_si128(result_unsaturated, saturation_mask);
return int16x8_m128i(result);
}
template <>
inline __m128i Dup<__m128i>(std::int32_t x) {
return _mm_set1_epi32(x);
}
template <>
inline int16x8_m128i Dup<int16x8_m128i>(std::int16_t x) {
return int16x8_m128i(_mm_set1_epi16(x));
}
// So far this is only needed for int16.
template <>
inline int16x8_m128i SaturatingAdd(int16x8_m128i a, int16x8_m128i b) {
return int16x8_m128i(_mm_adds_epi16(a.v, b.v));
}
} // end namespace gemmlowp
#endif // GEMMLOWP_INTERNAL_FIXEDPOINT_SSE_H_

View File

@@ -0,0 +1,166 @@
// Copyright 2018 The Gemmlowp Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// detect_platform.h: Sets up macros that control architecture-specific
// features of gemmlowp's implementation.
#ifndef GEMMLOWP_INTERNAL_DETECT_PLATFORM_H_
#define GEMMLOWP_INTERNAL_DETECT_PLATFORM_H_
// Our inline assembly path assume GCC/Clang syntax.
// Native Client doesn't seem to support inline assembly(?).
#if defined(__GNUC__) && !defined(__native_client__)
#define GEMMLOWP_ALLOW_INLINE_ASM
#endif
// Define macro statement that avoids inlining for GCC.
// For non-GCC, define as empty macro.
#if defined(__GNUC__)
#define GEMMLOWP_NOINLINE __attribute__((noinline))
#else
#define GEMMLOWP_NOINLINE
#endif
// Detect ARM, 32-bit or 64-bit
#ifdef __arm__
#define GEMMLOWP_ARM_32
#endif
#ifdef __aarch64__
#define GEMMLOWP_ARM_64
#endif
#if defined(GEMMLOWP_ARM_32) || defined(GEMMLOWP_ARM_64)
#define GEMMLOWP_ARM
#endif
// Detect MIPS, 32-bit or 64-bit
#if defined(__mips) && !defined(__LP64__)
#define GEMMLOWP_MIPS_32
#endif
#if defined(__mips) && defined(__LP64__)
#define GEMMLOWP_MIPS_64
#endif
#if defined(GEMMLOWP_MIPS_32) || defined(GEMMLOWP_MIPS_64)
#define GEMMLOWP_MIPS
#endif
// Detect x86, 32-bit or 64-bit
#if defined(__i386__) || defined(_M_IX86) || defined(_X86_) || defined(__i386)
#define GEMMLOWP_X86_32
#endif
#if defined(__x86_64__) || defined(_M_X64) || defined(__amd64)
#define GEMMLOWP_X86_64
#endif
#if defined(GEMMLOWP_X86_32) || defined(GEMMLOWP_X86_64)
#define GEMMLOWP_X86
#endif
// Some of our optimized paths use inline assembly and for
// now we don't bother enabling some other optimized paths using intrinddics
// where we can't use inline assembly paths.
#ifdef GEMMLOWP_ALLOW_INLINE_ASM
// Detect NEON. It's important to check for both tokens.
#if (defined __ARM_NEON) || (defined __ARM_NEON__)
#define GEMMLOWP_NEON
#endif
// Convenience NEON tokens for 32-bit or 64-bit
#if defined(GEMMLOWP_NEON) && defined(GEMMLOWP_ARM_32)
#define GEMMLOWP_NEON_32
#endif
#if defined(GEMMLOWP_NEON) && defined(GEMMLOWP_ARM_64)
#define GEMMLOWP_NEON_64
#endif
// Detect MIPS MSA.
// Limit MSA optimizations to little-endian CPUs for now.
// TODO: Perhaps, eventually support MSA optimizations on big-endian CPUs?
#if defined(GEMMLOWP_MIPS) && (__mips_isa_rev >= 5) && defined(__mips_msa) && \
defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)
#define GEMMLOWP_MSA
#endif
// Convenience MIPS MSA tokens for 32-bit or 64-bit.
#if defined(GEMMLOWP_MSA) && defined(GEMMLOWP_MIPS_32)
#define GEMMLOWP_MSA_32
#endif
#if defined(GEMMLOWP_MSA) && defined(GEMMLOWP_MIPS_64)
#define GEMMLOWP_MSA_64
#endif
// compiler define for AVX2 -D GEMMLOWP_ENABLE_AVX2
// Detect AVX2
#if defined(__AVX2__) && defined(GEMMLOWP_ENABLE_AVX2)
#define GEMMLOWP_AVX2
// Detect SSE4.
// MSVC does not have __SSE4_1__ macro, but will enable SSE4
// when AVX is turned on.
#elif defined(__SSE4_1__) || (defined(_MSC_VER) && defined(__AVX__))
#define GEMMLOWP_SSE4
// Detect SSE3.
#elif defined(__SSE3__)
#define GEMMLOWP_SSE3
#endif
// Convenience SSE4 tokens for 32-bit or 64-bit
#if defined(GEMMLOWP_SSE4) && defined(GEMMLOWP_X86_32) && \
!defined(GEMMLOWP_DISABLE_SSE4)
#define GEMMLOWP_SSE4_32
#endif
#if defined(GEMMLOWP_SSE3) && defined(GEMMLOWP_X86_32)
#define GEMMLOWP_SSE3_32
#endif
#if defined(GEMMLOWP_SSE4) && defined(GEMMLOWP_X86_64) && \
!defined(GEMMLOWP_DISABLE_SSE4)
#define GEMMLOWP_SSE4_64
#endif
#if defined(GEMMLOWP_SSE3) && defined(GEMMLOWP_X86_64)
#define GEMMLOWP_SSE3_64
#endif
#if defined(GEMMLOWP_AVX2) && defined(GEMMLOWP_X86_64)
#define GEMMLOWP_AVX2_64
#endif
#if defined(__has_feature)
#if __has_feature(memory_sanitizer)
#include <sanitizer/msan_interface.h>
#define GEMMLOWP_MARK_MEMORY_AS_INITIALIZED __msan_unpoison
#elif __has_feature(address_sanitizer)
#include <sanitizer/asan_interface.h>
#define GEMMLOWP_MARK_MEMORY_AS_INITIALIZED __asan_unpoison_memory_region
#endif
#endif
#endif // GEMMLOWP_ALLOW_INLINE_ASM
// Detect Android. Don't conflate with ARM - we care about tuning
// for non-ARM Android devices too. This can be used in conjunction
// with x86 to tune differently for mobile x86 CPUs (Atom) vs. desktop x86 CPUs.
#if defined(__ANDROID__) || defined(ANDROID)
#define GEMMLOWP_ANDROID
#endif
#endif // GEMMLOWP_INTERNAL_DETECT_PLATFORM_H_

View File

@@ -0,0 +1,11 @@
Copyright (c) 2003-2010 Mark Borgerding
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the author nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -0,0 +1,164 @@
/*
Copyright (c) 2003-2010, Mark Borgerding
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the author nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/* kiss_fft.h
defines kiss_fft_scalar as either short or a float type
and defines
typedef struct { kiss_fft_scalar r; kiss_fft_scalar i; }kiss_fft_cpx; */
#include "kiss_fft.h"
#include <limits.h>
#define MAXFACTORS 32
/* e.g. an fft of length 128 has 4 factors
as far as kissfft is concerned
4*4*4*2
*/
struct kiss_fft_state{
int nfft;
int inverse;
int factors[2*MAXFACTORS];
kiss_fft_cpx twiddles[1];
};
/*
Explanation of macros dealing with complex math:
C_MUL(m,a,b) : m = a*b
C_FIXDIV( c , div ) : if a fixed point impl., c /= div. noop otherwise
C_SUB( res, a,b) : res = a - b
C_SUBFROM( res , a) : res -= a
C_ADDTO( res , a) : res += a
* */
#ifdef FIXED_POINT
#if (FIXED_POINT==32)
# define FRACBITS 31
# define SAMPPROD int64_t
#define SAMP_MAX 2147483647
#else
# define FRACBITS 15
# define SAMPPROD int32_t
#define SAMP_MAX 32767
#endif
#define SAMP_MIN -SAMP_MAX
#if defined(CHECK_OVERFLOW)
# define CHECK_OVERFLOW_OP(a,op,b) \
if ( (SAMPPROD)(a) op (SAMPPROD)(b) > SAMP_MAX || (SAMPPROD)(a) op (SAMPPROD)(b) < SAMP_MIN ) { \
fprintf(stderr,"WARNING:overflow @ " __FILE__ "(%d): (%d " #op" %d) = %ld\n",__LINE__,(a),(b),(SAMPPROD)(a) op (SAMPPROD)(b) ); }
#endif
# define smul(a,b) ( (SAMPPROD)(a)*(b) )
# define sround( x ) (kiss_fft_scalar)( ( (x) + (1<<(FRACBITS-1)) ) >> FRACBITS )
# define S_MUL(a,b) sround( smul(a,b) )
# define C_MUL(m,a,b) \
do{ (m).r = sround( smul((a).r,(b).r) - smul((a).i,(b).i) ); \
(m).i = sround( smul((a).r,(b).i) + smul((a).i,(b).r) ); }while(0)
# define DIVSCALAR(x,k) \
(x) = sround( smul( x, SAMP_MAX/k ) )
# define C_FIXDIV(c,div) \
do { DIVSCALAR( (c).r , div); \
DIVSCALAR( (c).i , div); }while (0)
# define C_MULBYSCALAR( c, s ) \
do{ (c).r = sround( smul( (c).r , s ) ) ;\
(c).i = sround( smul( (c).i , s ) ) ; }while(0)
#else /* not FIXED_POINT*/
# define S_MUL(a,b) ( (a)*(b) )
#define C_MUL(m,a,b) \
do{ (m).r = (a).r*(b).r - (a).i*(b).i;\
(m).i = (a).r*(b).i + (a).i*(b).r; }while(0)
# define C_FIXDIV(c,div) /* NOOP */
# define C_MULBYSCALAR( c, s ) \
do{ (c).r *= (s);\
(c).i *= (s); }while(0)
#endif
#ifndef CHECK_OVERFLOW_OP
# define CHECK_OVERFLOW_OP(a,op,b) /* noop */
#endif
#define C_ADD( res, a,b)\
do { \
CHECK_OVERFLOW_OP((a).r,+,(b).r)\
CHECK_OVERFLOW_OP((a).i,+,(b).i)\
(res).r=(a).r+(b).r; (res).i=(a).i+(b).i; \
}while(0)
#define C_SUB( res, a,b)\
do { \
CHECK_OVERFLOW_OP((a).r,-,(b).r)\
CHECK_OVERFLOW_OP((a).i,-,(b).i)\
(res).r=(a).r-(b).r; (res).i=(a).i-(b).i; \
}while(0)
#define C_ADDTO( res , a)\
do { \
CHECK_OVERFLOW_OP((res).r,+,(a).r)\
CHECK_OVERFLOW_OP((res).i,+,(a).i)\
(res).r += (a).r; (res).i += (a).i;\
}while(0)
#define C_SUBFROM( res , a)\
do {\
CHECK_OVERFLOW_OP((res).r,-,(a).r)\
CHECK_OVERFLOW_OP((res).i,-,(a).i)\
(res).r -= (a).r; (res).i -= (a).i; \
}while(0)
#ifdef FIXED_POINT
# define KISS_FFT_COS(phase) floor(.5+SAMP_MAX * cos (phase))
# define KISS_FFT_SIN(phase) floor(.5+SAMP_MAX * sin (phase))
# define HALF_OF(x) ((x)>>1)
#elif defined(USE_SIMD)
# define KISS_FFT_COS(phase) _mm_set1_ps( cos(phase) )
# define KISS_FFT_SIN(phase) _mm_set1_ps( sin(phase) )
# define HALF_OF(x) ((x)*_mm_set1_ps(.5))
#else
# define KISS_FFT_COS(phase) (kiss_fft_scalar) cos(phase)
# define KISS_FFT_SIN(phase) (kiss_fft_scalar) sin(phase)
# define HALF_OF(x) ((x)*.5)
#endif
#define kf_cexp(x,phase) \
do{ \
(x)->r = KISS_FFT_COS(phase);\
(x)->i = KISS_FFT_SIN(phase);\
}while(0)
/* a debugging function */
#define pcpx(c)\
fprintf(stderr,"%g + %gi\n",(double)((c)->r),(double)((c)->i) )
#ifdef KISS_FFT_USE_ALLOCA
// define this to allow use of alloca instead of malloc for temporary buffers
// Temporary buffers are used in two case:
// 1. FFT sizes that have "bad" factors. i.e. not 2,3 and 5
// 2. "in-place" FFTs. Notice the quotes, since kissfft does not really do an in-place transform.
#include <alloca.h>
#define KISS_FFT_TMP_ALLOC(nbytes) alloca(nbytes)
#define KISS_FFT_TMP_FREE(ptr)
#else
#define KISS_FFT_TMP_ALLOC(nbytes) KISS_FFT_MALLOC(nbytes)
#define KISS_FFT_TMP_FREE(ptr) KISS_FFT_FREE(ptr)
#endif

View File

@@ -0,0 +1,131 @@
#ifndef KISS_FFT_H
#define KISS_FFT_H
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#ifdef __cplusplus
extern "C" {
#endif
/*
ATTENTION!
If you would like a :
-- a utility that will handle the caching of fft objects
-- real-only (no imaginary time component ) FFT
-- a multi-dimensional FFT
-- a command-line utility to perform ffts
-- a command-line utility to perform fast-convolution filtering
Then see kfc.h kiss_fftr.h kiss_fftnd.h fftutil.c kiss_fastfir.c
in the tools/ directory.
*/
#ifdef USE_SIMD
# include <xmmintrin.h>
# define kiss_fft_scalar __m128
#define KISS_FFT_MALLOC(nbytes) _mm_malloc(nbytes,16)
#define KISS_FFT_FREE _mm_free
#else
#define KISS_FFT_MALLOC(X) (void*)(0) /* Patched. */
#define KISS_FFT_FREE(X) /* Patched. */
#endif
// Patched automatically by download_dependencies.sh so default is 16 bit.
#ifndef FIXED_POINT
#define FIXED_POINT (16)
#endif
// End patch.
#ifdef FIXED_POINT
#include <stdint.h> /* Patched. */
#include <sys/types.h>
# if (FIXED_POINT == 32)
# define kiss_fft_scalar int32_t
# else
# define kiss_fft_scalar int16_t
# endif
#else
# ifndef kiss_fft_scalar
/* default is float */
# define kiss_fft_scalar float
# endif
#endif
typedef struct {
kiss_fft_scalar r;
kiss_fft_scalar i;
}kiss_fft_cpx;
typedef struct kiss_fft_state* kiss_fft_cfg;
/*
* kiss_fft_alloc
*
* Initialize a FFT (or IFFT) algorithm's cfg/state buffer.
*
* typical usage: kiss_fft_cfg mycfg=kiss_fft_alloc(1024,0,NULL,NULL);
*
* The return value from fft_alloc is a cfg buffer used internally
* by the fft routine or NULL.
*
* If lenmem is NULL, then kiss_fft_alloc will allocate a cfg buffer using malloc.
* The returned value should be free()d when done to avoid memory leaks.
*
* The state can be placed in a user supplied buffer 'mem':
* If lenmem is not NULL and mem is not NULL and *lenmem is large enough,
* then the function places the cfg in mem and the size used in *lenmem
* and returns mem.
*
* If lenmem is not NULL and ( mem is NULL or *lenmem is not large enough),
* then the function returns NULL and places the minimum cfg
* buffer size in *lenmem.
* */
kiss_fft_cfg kiss_fft_alloc(int nfft,int inverse_fft,void * mem,size_t * lenmem);
/*
* kiss_fft(cfg,in_out_buf)
*
* Perform an FFT on a complex input buffer.
* for a forward FFT,
* fin should be f[0] , f[1] , ... ,f[nfft-1]
* fout will be F[0] , F[1] , ... ,F[nfft-1]
* Note that each element is complex and can be accessed like
f[k].r and f[k].i
* */
void kiss_fft(kiss_fft_cfg cfg,const kiss_fft_cpx *fin,kiss_fft_cpx *fout);
/*
A more generic version of the above function. It reads its input from every Nth sample.
* */
void kiss_fft_stride(kiss_fft_cfg cfg,const kiss_fft_cpx *fin,kiss_fft_cpx *fout,int fin_stride);
/* If kiss_fft_alloc allocated a buffer, it is one contiguous
buffer and can be simply free()d when no longer needed*/
#define kiss_fft_free free
/*
Cleans up some memory that gets managed internally. Not necessary to call, but it might clean up
your compiler output to call this before you exit.
*/
void kiss_fft_cleanup(void);
/*
* Returns the smallest integer k, such that k>=n and k has only "fast" factors (2,3,5)
*/
int kiss_fft_next_fast_size(int n);
/* for real ffts, we need an even size */
#define kiss_fftr_next_fast_size_real(n) \
(kiss_fft_next_fast_size( ((n)+1)>>1)<<1)
#ifdef __cplusplus
}
#endif
#endif

View File

@@ -0,0 +1,46 @@
#ifndef KISS_FTR_H
#define KISS_FTR_H
#include "kiss_fft.h"
#ifdef __cplusplus
extern "C" {
#endif
/*
Real optimized version can save about 45% cpu time vs. complex fft of a real seq.
*/
typedef struct kiss_fftr_state *kiss_fftr_cfg;
kiss_fftr_cfg kiss_fftr_alloc(int nfft,int inverse_fft,void * mem, size_t * lenmem);
/*
nfft must be even
If you don't care to allocate space, use mem = lenmem = NULL
*/
void kiss_fftr(kiss_fftr_cfg cfg,const kiss_fft_scalar *timedata,kiss_fft_cpx *freqdata);
/*
input timedata has nfft scalar points
output freqdata has nfft/2+1 complex points
*/
void kiss_fftri(kiss_fftr_cfg cfg,const kiss_fft_cpx *freqdata,kiss_fft_scalar *timedata);
/*
input freqdata has nfft/2+1 complex points
output timedata has nfft scalar points
*/
#define kiss_fftr_free free
#ifdef __cplusplus
}
#endif
#endif

View File

@@ -0,0 +1,203 @@
/* Copyright 2020 Google LLC. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#ifndef RUY_RUY_PROFILER_INSTRUMENTATION_H_
#define RUY_RUY_PROFILER_INSTRUMENTATION_H_
#ifdef RUY_PROFILER
#include <stdio.h>
#include <mutex>
#include <vector>
#endif
namespace ruy {
namespace profiler {
#ifdef RUY_PROFILER
// A label is how a code scope is annotated to appear in profiles.
// The stacks that are sampled by the profiler are stacks of such labels.
// A label consists of a literal string, plus optional integer arguments.
class Label {
public:
Label() {}
template <typename... Args>
explicit Label(Args... args) {
Set(args...);
}
void Set(const char* format) {
format_ = format;
args_count_ = 0;
}
template <typename... Args>
void Set(const char* format, Args... args) {
format_ = format;
args_count_ = sizeof...(args);
SetArgs(0, args...);
}
void operator=(const Label& other);
bool operator==(const Label& other) const;
std::string Formatted() const;
const char* format() const { return format_; }
private:
void SetArgs(int position, int arg0) { args_[position] = arg0; }
template <typename... Args>
void SetArgs(int position, int arg0, Args... args) {
SetArgs(position, arg0);
SetArgs(position + 1, args...);
}
static constexpr int kMaxArgs = 4;
const char* format_ = nullptr;
int args_count_ = 0;
int args_[kMaxArgs];
};
namespace detail {
// Forward-declaration, see class ThreadStack below.
class ThreadStack;
bool& GlobalIsProfilerRunning();
// Returns the global vector of pointers to all stacks, there being one stack
// per thread executing instrumented code.
std::vector<ThreadStack*>* GlobalAllThreadStacks();
// Returns the mutex to be locked around any access to GlobalAllThreadStacks().
std::mutex* GlobalsMutex();
// Returns the thread-local stack, specific to the current thread.
ThreadStack* ThreadLocalThreadStack();
// This 'stack' is what may be more appropriately called a 'pseudostack':
// It contains Label entries that are 'manually' entered by instrumentation
// code. It's unrelated to real call stacks.
struct Stack {
std::uint32_t id = 0;
static constexpr int kMaxSize = 64;
int size = 0;
Label labels[kMaxSize];
};
// Returns the buffer byte size required by CopyToSample.
int GetBufferSize(const Stack& stack);
// Copies this Stack into a byte buffer, called a 'sample'.
void CopyToBuffer(const Stack& stack, char* dst);
// Populates this Stack from an existing sample buffer, typically
// produced by CopyToSample.
void ReadFromBuffer(const char* src, Stack* stack);
// ThreadStack is meant to be used as a thread-local singleton, assigning to
// each thread a Stack object holding its pseudo-stack of profile labels,
// plus a mutex allowing to synchronize accesses to this pseudo-stack between
// this thread and a possible profiler thread sampling it.
class ThreadStack {
public:
ThreadStack();
~ThreadStack();
const Stack& stack() const { return stack_; }
// Returns the mutex to lock around any access to this stack. Each stack is
// accessed by potentially two threads: the thread that it belongs to
// (which calls Push and Pop) and the profiler thread during profiling
// (which calls CopyToSample).
std::mutex& Mutex() const { return mutex_; }
// Pushes a new label on the top of this Stack.
template <typename... Args>
void Push(Args... args) {
// This mutex locking is needed to guard against race conditions as both
// the current thread and the profiler thread may be concurrently accessing
// this stack. In addition to that, this mutex locking also serves the other
// purpose of acting as a barrier (of compiler code reordering, of runtime
// CPU instruction reordering, and of memory access reordering), which
// gives a measure of correctness to this profiler. The downside is some
// latency. As this lock will be uncontended most of the times, the cost
// should be roughly that of an sequentially-consistent atomic access,
// comparable to an access to the level of CPU data cache that is shared
// among all cores, typically 60 cycles on current ARM CPUs, plus side
// effects from barrier instructions.
std::lock_guard<std::mutex> lock(mutex_);
// Avoid overrunning the stack, even in 'release' builds. This profiling
// instrumentation code should not ship in release builds anyway, the
// overhead of this check is negligible, and overrunning a stack array would
// be bad.
if (stack_.size >= Stack::kMaxSize) {
abort();
}
stack_.labels[stack_.size++].Set(args...);
}
// Pops the top-most label from this Stack.
void Pop() {
// See the comment in Push about this lock. While it would be tempting to
// try to remove this lock and just atomically decrement size_ with a
// store-release, that would not necessarily be a substitute for all of the
// purposes that this lock serves, or if it was done carefully to serve all
// of the same purposes, then that wouldn't be faster than this (mostly
// uncontended) lock.
std::lock_guard<std::mutex> lock(mutex_);
stack_.size--;
}
private:
mutable std::mutex mutex_;
Stack stack_;
};
} // namespace detail
// RAII user-facing way to construct Labels associated with their life scope
// and get them pushed to / popped from the current thread stack.
class ScopeLabel {
public:
template <typename... Args>
ScopeLabel(Args... args) : thread_stack_(detail::ThreadLocalThreadStack()) {
thread_stack_->Push(args...);
}
~ScopeLabel() { thread_stack_->Pop(); }
private:
detail::ThreadStack* thread_stack_;
};
#else // no RUY_PROFILER
class ScopeLabel {
public:
template <typename... Args>
explicit ScopeLabel(Args...) {}
// This destructor is needed to consistently silence clang's -Wunused-variable
// which seems to trigger semi-randomly.
~ScopeLabel() {}
};
#endif
} // namespace profiler
} // namespace ruy
#endif // RUY_RUY_PROFILER_INSTRUMENTATION_H_