
Through the data collection function, enrich the data in the platform to meet the subsequent analysis needs:
Data processing refers to the cleaning, conversion, and loading of data before data analysis:
Self-service ETL has the following four characteristics:
Integration: Integrated in Synapse, it can be used without independent deployment. Visualization: Fully interfaced and direct operation, business personnel can participate.
High performance: The distributed meter has powerful performance and adopts the industry's advanced architecture. It can handle massive quantities. The maximum scale can reach the PB level. The data processing performance is 10 times that of the same type of traditional tools.
Strong function: A large number of components take into account general data processing and advanced data processing.
Enterprise data is stored in different servers or even different types of databases. When users query data in a wide range and are not limited to one database, it is necessary to perform related query analysis across multiple databases. We don't need the traditional way: extract to a unified database through ETL, we provide cross-database joint data sources.
Highlights of cross-library integration
Synapse supports the access of rich data sources, but generally it is not possible to directly use the accessed service library for data analysis. Therefore, in the fetching process before report development, integrating the required data into a data collection can be understood as obtaining the data we need based on the database, which is also a preparatory step before the interactive analysis between the data analyst and the end business user.
Highlights of data query capabilities
With the mature application of distributed and parallel technology, the MPP engine database has gradually shown strong high-throughput and low-latency computing capabilities. The use of MPP architecture can reach "100 million seconds." Synapse MPP DW columnar database is mainly used in the field of data analysis.
It self-contains storage and computing capabilities, fully realizes high availability independently, and supports complete SQL syntax including JOIN, etc., and has obvious technical advantages. Compared with the Hadoop system, it is easier to use the database to do big data processing, with low learning cost and high flexibility.
We have done very meticulous work in the computing layer, doing our best to squeeze the hardware capacity and improve the query speed. It implements a variety of important technologies such as single-machine multi-core parallelism, distributed computing, vectorized execution, SIMD instructions, and code generation.
Starting from the needs of data analysis scenarios, we customized and developed a new set of high-efficiency columnar storage engines, and realized rich functions such as data orderly storage, primary key index, sparse index, data sharding, data partitioning, TTL, and master/backup replication. We provide the migration and implementation of any DW, and provide an assessment of the hardware requirements of the enterprise data scale.