avro的安装与使用

2018/07/11

介绍

Avro是一个数据序列化系统,设计用于支持大批量数据交换的应用。它的主要特点有:支持二进制序列化方式,可以便捷,快速地处理大量数据;动态语言友好,Avro提供的机制使动态语言可以方便地处理Avro数据。

数据序列化/反序列化(data serialization/deserialization)

支持两种序列化编码方式:二进制编码和JSON编码。

使用二进制编码会高效序列化,并且序列化后得到的结果会比较小;而JSON一般用于调试系统或是基于WEB的应用。

实例

依赖

<dependency>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro</artifactId>
  <version>1.8.0</version>
</dependency>

编译插件

<plugin>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-maven-plugin</artifactId>
  <version>1.8.0</version>
  <executions>
    <execution>
      <phase>generate-sources</phase>
      <goals>
        <goal>schema</goal>
      </goals>
      <configuration>
        <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
        <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
      </configuration>
    </execution>
  </executions>
</plugin>
<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-compiler-plugin</artifactId>
  <configuration>
    <source>1.6</source>
    <target>1.6</target>
  </configuration>
</plugin>

src/main下新建avro文件夹并新建user.avsc文件

{"namespace": "cn.md31.avro.test.bean",
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "name", "type": "string"},
    {"name": "favorite_number",  "type": ["int", "null"]},
    {"name": "favorite_color", "type": ["string", "null"]}
  ]
}

其中

namespace在java项目中翻译成包名

name是类名

fields就是配置的属性

注意:必须配置type为record

生成User的java文件

java -jar /path/to/avro-tools-1.8.0.jar compile schema <schema file> <destination>
或 mvn complie

初始化User对象

List<User> userList = new ArrayList<User>();
User user1 = new User("user1", 10, "red");
User user2 = new User();
user2.setName("user2");
user2.setFavoriteNumber(11);
user2.setFavoriteColor("white");
User user3 = User.newBuilder()
    .setName("user3")
    .setFavoriteNumber(12)
    .setFavoriteColor("black")
    .build();
userList.add(user1);
userList.add(user2);
userList.add(user3);

序列化到文件中

DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);
DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter);
dataFileWriter.create(userList.get(0).getSchema(), new File(fileName));
for (User user: userList) {
  dataFileWriter.append(user);
}
dataFileWriter.close();

从文件中反序列化对象

File file = new File(fileName);
DatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class);
DataFileReader<User> dataFileReader = new DataFileReader<User>(file, userDatumReader);
User user = null;
System.out.println("-------------deserializeAvroFromFile-------------");
while (dataFileReader.hasNext()) {
  user = dataFileReader.next(user);
  System.out.println(user);
}

序列化对象成byte数组

ByteArrayOutputStream baos = new ByteArrayOutputStream();
DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);
DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter);
dataFileWriter.create(userList.get(0).getSchema(), baos);
for (User user: userList) {
  dataFileWriter.append(user);
}
dataFileWriter.close();
return baos.toByteArray();

从byte数组中反序列化成对象

SeekableByteArrayInput sbai = baos.toByteArray();
DatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class);
DataFileReader<User> dataFileReader = new DataFileReader<User>(sbai, userDatumReader);
System.out.println("-------------deserialzeAvroFromByteArray-------------");
User readUser = null;
while (dataFileReader.hasNext()) {
  readUser = dataFileReader.next(readUser);
  System.out.println(readUser);
}

目录