Data Collection

Through the data collection function, enrich the data in the platform to meet the subsequent analysis needs:

  • Regular import: Import the collected Excel files into the database according to predefined rules
  • Data reporting: Users can collect or modify data through forms or lists, and support mobile phone operations
  • Filling process: support multiple people to fill in, support multi-branch, sub-process and countersignature review mechanism

Data Processing

Data processing refers to the cleaning, conversion, and loading of data before data analysis:

  • Self-processing data, no need to rely on database development
  • Visual data processing workflow, easy to operate
  • Built-in preprocessing nodes for commonly used rows and columns
  • Compatible with SQL syntax to extend ETL functions

Self-service ETL has the following four characteristics:

Integration: Integrated in Synapse, it can be used without independent deployment. Visualization: Fully interfaced and direct operation, business personnel can participate.

High performance: The distributed meter has powerful performance and adopts the industry's advanced architecture. It can handle massive quantities. The maximum scale can reach the PB level. The data processing performance is 10 times that of the same type of traditional tools.

Strong function: A large number of components take into account general data processing and advanced data processing.

Cross-Database Integration

Enterprise data is stored in different servers or even different types of databases. When users query data in a wide range and are not limited to one database, it is necessary to perform related query analysis across multiple databases. We don't need the traditional way: extract to a unified database through ETL, we provide cross-database joint data sources.

Highlights of cross-library integration

  • No need to land
    Synapse provides direct cross-database query, and has a built-in data cross-database query engine, which is associated in memory without data landing. The intermediate extraction link is omitted, and the real-time performance of the query data is ensured.
  • Quick deployment, ready to use out of the box
    The system has a built-in cross-database engine, no additional installation and deployment are required.
  • Fast processing and good expansion
    For massive large data cross-database query, the built-in cross-database engine can meet the needs of enterprise growth through linear expansion and parallel processing solutions.
  • High-performance applications
    Cross-database data source support is applied in the definition of data sets. Our regular data analysis underlying structure is based on the data source directly connected to our data connection for data analysis and display. In this case, if our data volume is relatively small Generally, there is no problem, but once our data reaches a certain level, our report performance will have a big bottleneck. At this time, we can directly use the cache memory library mechanism to ensure that the system has a long life. And the most important guarantee for expansion capabilities.

Data Control & Query

Synapse supports the access of rich data sources, but generally it is not possible to directly use the accessed service library for data analysis. Therefore, in the fetching process before report development, integrating the required data into a data collection can be understood as obtaining the data we need based on the database, which is also a preparatory step before the interactive analysis between the data analyst and the end business user.

Highlights of data query capabilities

  • Visual data preparation
    Synapse provides powerful interface-based data management capabilities, and users build their own business-required data sets on the basis of source data relationships. For example, visual query allows users to drag and drop on a visual interface to easily complete the construction of the data model, so that our analysis users can obtain the prepared data faster and more intuitively, so as to make business decisions faster and more intelligently.
  • Support cross-database query
    The self-service data set supports cross-database query. When users query data in a wide range and not limited to one database, they can query across multiple databases. Provide cross-database query function, support the association of different data sources, and deal with the problem of unified access to different interface data.
  • Cache mechanism
    The system supports data sets with data extraction functions: self-service data sets, visualization data sets, SQL data sets, stored procedure data sets, Java data sets, ad hoc query, and perspective analysis. Most data sets support the extraction of raw data from the source database to the cache library through data extraction, which can ensure that a large amount of data results can be obtained in seconds and improve system performance.

Massively Parallel Processing DW

With the mature application of distributed and parallel technology, the MPP engine database has gradually shown strong high-throughput and low-latency computing capabilities. The use of MPP architecture can reach "100 million seconds." Synapse MPP DW columnar database is mainly used in the field of data analysis.

It self-contains storage and computing capabilities, fully realizes high availability independently, and supports complete SQL syntax including JOIN, etc., and has obvious technical advantages. Compared with the Hadoop system, it is easier to use the database to do big data processing, with low learning cost and high flexibility.

We have done very meticulous work in the computing layer, doing our best to squeeze the hardware capacity and improve the query speed. It implements a variety of important technologies such as single-machine multi-core parallelism, distributed computing, vectorized execution, SIMD instructions, and code generation.

Starting from the needs of data analysis scenarios, we customized and developed a new set of high-efficiency columnar storage engines, and realized rich functions such as data orderly storage, primary key index, sparse index, data sharding, data partitioning, TTL, and master/backup replication. We provide the migration and implementation of any DW, and provide an assessment of the hardware requirements of the enterprise data scale.