Skip to main content
Version: 2.1.0 ~ 2.5.0

Save to file

Scrcpy's video stream contains raw encoded video frames in varies codecs (H.264/AVC, H.265/HEVC, and AV1). A few video players (e.g. VLC) can play these raw streams directly, but most others require a container format (e.g. MP4): the video stream plus some metadata.

Introduction

There are two ways to convert the raw encoded video frames into a video file: re-encode the video or mux the video stream directly. Re-encode the file can convert the video frames into another codec, or provide better compression. Muxing the video stream is a very simple process that re-use the original video stream directly which saves CPU power and memory. Here is a more detailed comparison:

  • Codec and configuration: Re-encode the video allows you to use any codec and configuration. Because PC is often more powerful than mobile devices, it might be able to achieve the same quality with less bitrate (thus smaller files). When muxing the video stream, the codec and configuration is always same as captured.
  • Quality: Most video encoding processes are lossy. The quality of re-encoded video will be slightly lower than the original video. When muxing the video stream, the quality is unchanged.
  • Processing power and memory: Re-encode the video requires more CPU/GPU power and memory. Muxing a video requires almost no CPU power and memory.
  • Code complexity: Because the video resolution can change with device orientation, it's more complex to re-encode the video. Muxing the video stream doesn't require any additional code to handle that.

Re-encode the video

The MediaRecorder API can record the rendering canvas (from TinyH264 decoder or WebCodecs decoder) into a MP4 or WEBM file.

Note that for WebGLVideoFrameRenderer, the enableCapture parameter must be set to true.

MDN has an example of using MediaRecorder. There is nothing special for using MediaRecorder with our decoder canvases.

Muxing the video stream

A muxer library is required to mux the video stream into a container format.

But before that, for H.264 and H.265, we need to know the difference between their two stream formats.

AVCC/HEVC vs Annex B for H.264/H.265 codecs

H.264 and H.265 have two different stream formats.

Use H.264 as an example:

H.264/AVC is a joint effect between ITU-T and ISO, thus it has two names. The words "H.264" and "AVC" are usually used interchangeably. However, they actually refer to two different specifications. Most parts of them (like how to encode and decode) are identical, but how to store those packets in files is different:

  • ITU-T H.264 standard (https://www.itu.int/rec/T-REC-H.264) defines a bit stream for storing H.264 packets. This format doesn't have an official name, but since it's in the Annex B section of the specification, it's commonly referred to as "Annex B format", and generally used in transmitting raw H.264 streams, and in .ts and .h264 files
  • ISO/IEC MPEG-4 AVC standard (https://www.iso.org/obp/ui/en/#iso:std:iso-iec:14496:-10:ed-10:v1:en) uses varies C structure-like format to store those packets. This format also does't have an official name, but is commonly referred to as "AVC" or "AVCC", and used by various containers like MP4 and MKV.

H.265 has a similar story, its Annex B format is identical to H.264, its ISO format uses different structures and is called "HEVC".

Scrcpy server uses Android MediaCodec API, which produces Annex B formatted H.264/H.265 streams. Then Scrcpy server transmits them to clients as-is. But to save them into files, they might need to be converted to AVCC/HEVC format.

Use Mediabunny

One muxer library we recommend is Mediabunny. It supports multiple sources (including our encoded video frames) and multiple container formats (mp4, mov, webm and mkv). It's also the successor of mp4-muxer and webm-muxer.

First, install the required packages:

npm install @yume-chan/scrcpy @yume-chan/stream-extra mediabunny

To connect Scrcpy video stream to Mediabunny, we need to create an Output instance and pipe Scrcpy's videoPacketStream (from createMediaStreamTransformer or AdbScrcpyClient.videoStream) to it.

This example uses MkvOutputFormat which produces a .mkv file. Technically, MKV files only support AVCC/HEVC format for H.264/H.265 codec, but most video players (VLC, Windows Media Player, etc.) and tools (FFmpeg) can also handle Annex B formatted streams, so we don't need to convert the stream.

To output a .mp4 file, replace MkvOutputFormat with Mp4OutputFormat. MP4 only supports AVCC/HEVC format for H.264/H.265 codec, fortunately, Mediabunny will do the conversion for us.

This example uses BufferTarget which buffers video file in memory. Check Mediabunny documentation for other output targets.

import type { VideoCodec } from "mediabunny";
import {
BufferTarget,
EncodedPacket,
EncodedVideoPacketSource,
MkvOutputFormat,
Output,
} from "mediabunny";

import { getUint32LittleEndian } from "@yume-chan/no-data-view";
import type { ScrcpyMediaStreamPacket } from "@yume-chan/scrcpy";
import {
Av1,
h264ParseConfiguration,
h265ParseConfiguration,
ScrcpyVideoCodecId,
} from "@yume-chan/scrcpy";
import { WritableStream } from "@yume-chan/stream-extra";

// Map Scrcpy codecs to mediabunny codecs
const VideoCodecMap: Record<ScrcpyVideoCodecId, VideoCodec> = {
[ScrcpyVideoCodecId.H264]: "avc",
[ScrcpyVideoCodecId.H265]: "hevc",
[ScrcpyVideoCodecId.AV1]: "av1",
};

function hexDigits(value: number) {
return value.toString(16).toUpperCase();
}

function hexTwoDigits(value: number) {
return value.toString(16).toUpperCase().padStart(2, "0");
}

function decimalTwoDigits(value: number) {
return value.toString(10).padStart(2, "0");
}

export async function createRecordStream(videoCodec: ScrcpyVideoCodecId) {
const muxer = new Output({
format: new MkvOutputFormat(),
target: new BufferTarget(),
});

const videoTrack = new EncodedVideoPacketSource(VideoCodecMap[videoCodec]);

muxer.addVideoTrack(videoTrack);

await muxer.start();

let videoCodecConfiguration: Uint8Array | undefined;
let firstVideoTimestamp: bigint | undefined;

return new WritableStream<ScrcpyMediaStreamPacket>({
async write(chunk) {
if (chunk.type === "configuration") {
videoCodecConfiguration = chunk.data;
return;
}

if (firstVideoTimestamp === undefined) {
if (chunk.keyframe) {
firstVideoTimestamp = chunk.pts!;
} else {
// warn: first frame should be keyframe
return;
}
}

// chunk.pts is in microseconds (us),
// mediabunny requires timestamps in seconds
const timestamp = Number((chunk.pts! - firstVideoTimestamp) / 1000n) / 1000;

// AV1 doesn't need the configuration packet
if (videoCodec === ScrcpyVideoCodecId.AV1) {
const parser = new Av1(chunk.data);
const sequenceHeader = parser.searchSequenceHeaderObu();

if (sequenceHeader) {
const {
seq_profile: seqProfile,
seq_level_idx: [seqLevelIdx = 0],
max_frame_width_minus_1,
max_frame_height_minus_1,
color_config: {
BitDepth,
mono_chrome: monoChrome,
subsampling_x: subsamplingX,
subsampling_y: subsamplingY,
chroma_sample_position: chromaSamplePosition,
color_description_present_flag,
},
} = sequenceHeader;

let colorPrimaries: Av1.ColorPrimaries;
let transferCharacteristics: Av1.TransferCharacteristics;
let matrixCoefficients: Av1.MatrixCoefficients;
let colorRange: boolean;
if (color_description_present_flag) {
({
color_primaries: colorPrimaries,
transfer_characteristics: transferCharacteristics,
matrix_coefficients: matrixCoefficients,
color_range: colorRange,
} = sequenceHeader.color_config);
} else {
colorPrimaries = Av1.ColorPrimaries.Bt709;
transferCharacteristics = Av1.TransferCharacteristics.Bt709;
matrixCoefficients = Av1.MatrixCoefficients.Bt709;
colorRange = false;
}

const codedWidth = max_frame_width_minus_1 + 1;
const codedHeight = max_frame_height_minus_1 + 1;
const codecString = [
"av01",
seqProfile.toString(16),
decimalTwoDigits(seqLevelIdx) + (sequenceHeader.seq_tier[0] ? "H" : "M"),
decimalTwoDigits(BitDepth),
monoChrome ? "1" : "0",
(subsamplingX ? "1" : "0") +
(subsamplingY ? "1" : "0") +
chromaSamplePosition.toString(),
decimalTwoDigits(colorPrimaries),
decimalTwoDigits(transferCharacteristics),
decimalTwoDigits(matrixCoefficients),
colorRange ? "1" : "0",
].join(".");

await videoTrack.add(
new EncodedPacket(chunk.data, chunk.keyframe ? "key" : "delta", timestamp, 0),
{
decoderConfig: {
codec: codecString,
codedWidth,
codedHeight,
},
},
);

return;
}

await videoTrack.add(
new EncodedPacket(chunk.data, chunk.keyframe ? "key" : "delta", timestamp, 0),
);
return;
}

// For H.264/H.265, concat the configuration packet with the next frame packet
if (videoCodecConfiguration) {
const buffer = new Uint8Array(videoCodecConfiguration.length + chunk.data.length);
buffer.set(videoCodecConfiguration);
buffer.set(chunk.data, videoCodecConfiguration.length);

let codedWidth: number;
let codedHeight: number;
let codecString: string;
if (videoCodec === ScrcpyVideoCodecId.H264) {
const configuration = h264ParseConfiguration(videoCodecConfiguration);
codedWidth = configuration.encodedWidth;
codedHeight = configuration.encodedHeight;
// https://www.rfc-editor.org/rfc/rfc6381#section-3.3
// ISO Base Media File Format Name Space
codecString =
"avc1." +
hexTwoDigits(configuration.profileIndex) +
hexTwoDigits(configuration.constraintSet) +
hexTwoDigits(configuration.levelIndex);
} else {
const configuration = h265ParseConfiguration(videoCodecConfiguration);
codedWidth = configuration.encodedWidth;
codedHeight = configuration.encodedHeight;
codecString = [
"hev1",
["", "A", "B", "C"][configuration.generalProfileSpace]! +
configuration.generalProfileIndex.toString(),
hexDigits(getUint32LittleEndian(configuration.generalProfileCompatibilitySet, 0)),
(configuration.generalTierFlag ? "H" : "L") +
configuration.generalLevelIndex.toString(),
...Array.from(configuration.generalConstraintSet, hexDigits),
].join(".");
}

await videoTrack.add(
new EncodedPacket(buffer, chunk.keyframe ? "key" : "delta", timestamp, 0),
{
decoderConfig: {
codec: codecString,
codedWidth,
codedHeight,
},
},
);

videoCodecConfiguration = undefined;
return;
}

await videoTrack.add(
new EncodedPacket(chunk.data, chunk.keyframe ? "key" : "delta", timestamp, 0),
);
},
async close() {
await muxer.finalize();
const buffer = muxer.target.buffer!;

// Save `buffer` to a file
},
});
}

Then we can tee the videoPacketStream to pipe the video stream to both the decoder and recorder.

import type { ScrcpyMediaStreamPacket, ScrcpyVideoCodecId } from "@yume-chan/scrcpy";
import {
WebCodecsVideoDecoder,
WebGLVideoFrameRenderer,
} from "@yume-chan/scrcpy-decoder-webcodecs";
import type { ReadableStream } from "@yume-chan/stream-extra";

declare const videoCodec: ScrcpyVideoCodecId;
declare const videoPacketStream: ReadableStream<ScrcpyMediaStreamPacket>;

const [decoderStream, recorderStream] = videoPacketStream.tee();

const decoder = new WebCodecsVideoDecoder({
codec: videoCodec,
renderer: new WebGLVideoFrameRenderer(),
});
const recorder = await createRecordStream(videoCodec);

await Promise.all([decoderStream.pipeTo(decoder.writable), recorderStream.pipeTo(recorder)]);

Convert Annex B to AVCC/HEVC manually

If you want to use other muxer libraries or formats which don't support Annex B, you can convert the video stream from Annex B to AVCC/HEVC manually.

H.264 Configuration

As mentioned in configuration packet, H.264 configuration packet contains the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS), which contains information like codec profile, resolution, cropping, framerate, etc.

The configuration packets need to be store in two places:

  • In the video metadata, as a AVCDecoderConfigurationRecord structure
  • Prepended to the next frame data, in the video stream.
info

Configuration packets can occur multiple times in the stream, when the encoder is restarted. ONLY the first one needs to be converted to AVCDecoderConfigurationRecord and stored in metadata. But ALL of them must be prepended to their next frame data.

Subsequent configuration packets can be different from the first one, even including different resolution or codec profile. Most players can handle this correctly.

If frame metadata is enabled, the SPS and PPS will be in the configuration packets. Tango has a method to extract them from the configuration packet. If they can't be found in the specified buffer, an error will be thrown.

import { h264SearchConfiguration } from "@yume-chan/scrcpy";

for await (const packet of videoPacketStream) {
if (packet.type === "configuration") {
const { sequenceParameterSet, pictureParameterSet } =
h264SearchConfiguration(packet.data);
console.log(sequenceParameterSet, pictureParameterSet);
}
}

Then you can use this function to convert them into AvcDecoderConfigurationRecord:

// https://ffmpeg.org/doxygen/0.11/avc_8c-source.html#l00106
function h264ConfigurationToAvcDecoderConfigurationRecord(
sequenceParameterSet: Uint8Array,
pictureParameterSet: Uint8Array,
) {
const buffer = new Uint8Array(
11 + sequenceParameterSet.byteLength + pictureParameterSet.byteLength,
);
buffer[0] = 1;
buffer[1] = sequenceParameterSet[1]!;
buffer[2] = sequenceParameterSet[2]!;
buffer[3] = sequenceParameterSet[3]!;
buffer[4] = 0xff;
buffer[5] = 0xe1;
buffer[6] = sequenceParameterSet.byteLength >> 8;
buffer[7] = sequenceParameterSet.byteLength & 0xff;
buffer.set(sequenceParameterSet, 8);
buffer[8 + sequenceParameterSet.byteLength] = 1;
buffer[9 + sequenceParameterSet.byteLength] = pictureParameterSet.byteLength >> 8;
buffer[10 + sequenceParameterSet.byteLength] = pictureParameterSet.byteLength & 0xff;
buffer.set(pictureParameterSet, 11 + sequenceParameterSet.byteLength);
return buffer;
}

How to use the above AvcDecoderConfigurationRecord data depend on the container format and muxer library. Usually there will be a metadata, configuration, or description field for the video stream.

H.265 Configuration

The H.265 configuration packet contains the Video Parameter Set (VPS), Sequence Parameter Set (SPS), and Picture Parameter Set (PPS).

Same as H.264, it also need to be converted and stored in video metadata, and prepended to the next frame data to be stored in the video stream. Except, the functions for doing that are different because the data is different.

First extract them from a packet:

import { h265SearchConfiguration } from "@yume-chan/scrcpy";

for await (const packet of videoPacketStream) {
if (packet.type === "configuration") {
const { videoParameterSet, sequenceParameterSet, pictureParameterSet } =
h265SearchConfiguration(packet.data);
console.log(videoParameterSet, sequenceParameterSet, pictureParameterSet);
}
}

Then convert to HEVCDecoderConfigurationRecord:

import type { H265NaluRaw } from "@yume-chan/scrcpy";
import { h265ParseSequenceParameterSet, h265ParseVideoParameterSet } from "@yume-chan/scrcpy";

function h265ConfigurationToHevcDecoderConfigurationRecord(
videoParameterSet: H265NaluRaw,
sequenceParameterSet: H265NaluRaw,
pictureParameterSet: H265NaluRaw,
) {
const {
profileTierLevel: {
generalProfileTier: {
profile_space: general_profile_space,
tier_flag: general_tier_flag,
profile_idc: general_profile_idc,
profileCompatibilitySet: generalProfileCompatibilitySet,
constraintSet: generalConstraintSet,
},
general_level_idc,
},
vps_max_layers_minus1,
vps_temporal_id_nesting_flag,
} = h265ParseVideoParameterSet(videoParameterSet.rbsp);

const {
chroma_format_idc,
bit_depth_luma_minus8,
bit_depth_chroma_minus8,
vuiParameters: { min_spatial_segmentation_idc = 0 } = {},
} = h265ParseSequenceParameterSet(sequenceParameterSet.rbsp);

const buffer = new Uint8Array(
23 +
5 * 3 +
videoParameterSet.data.length +
sequenceParameterSet.data.length +
pictureParameterSet.data.length,
);

/* unsigned int(8) configurationVersion = 1; */
buffer[0] = 1;

/*
* unsigned int(2) general_profile_space;
* unsigned int(1) general_tier_flag;
* unsigned int(5) general_profile_idc;
*/
buffer[1] = (general_profile_space << 6) | (Number(general_tier_flag) << 5) | general_profile_idc;

/* unsigned int(32) general_profile_compatibility_flags; */
buffer[2] = generalProfileCompatibilitySet[0]!;
buffer[3] = generalProfileCompatibilitySet[1]!;
buffer[4] = generalProfileCompatibilitySet[2]!;
buffer[5] = generalProfileCompatibilitySet[3]!;

/* unsigned int(48) general_constraint_indicator_flags; */
buffer[6] = generalConstraintSet[0]!;
buffer[7] = generalConstraintSet[1]!;
buffer[8] = generalConstraintSet[2]!;
buffer[9] = generalConstraintSet[3]!;
buffer[10] = generalConstraintSet[4]!;
buffer[11] = generalConstraintSet[5]!;

/* unsigned int(8) general_level_idc; */
buffer[12] = general_level_idc;

/*
* bit(4) reserved = '1111'b;
* unsigned int(12) min_spatial_segmentation_idc;
*/
buffer[13] = 0xf0 | (min_spatial_segmentation_idc >> 8);
buffer[14] = min_spatial_segmentation_idc;

/*
* bit(6) reserved = '111111'b;
* unsigned int(2) parallelismType;
*/
buffer[15] = 0xfc;

/*
* bit(6) reserved = '111111'b;
* unsigned int(2) chromaFormat;
*/
buffer[16] = 0xfc | chroma_format_idc;

/*
* bit(5) reserved = '11111'b;
* unsigned int(3) bitDepthLumaMinus8;
*/
buffer[17] = 0xf8 | bit_depth_luma_minus8;

/*
* bit(5) reserved = '11111'b;
* unsigned int(3) bitDepthChromaMinus8;
*/
buffer[18] = 0xf8 | bit_depth_chroma_minus8;

/* bit(16) avgFrameRate; */
buffer[19] = 0;
buffer[20] = 0;

/*
* bit(2) constantFrameRate;
* bit(3) numTemporalLayers;
* bit(1) temporalIdNested;
* unsigned int(2) lengthSizeMinusOne;
*/
buffer[21] = ((vps_max_layers_minus1 + 1) << 3) | (Number(vps_temporal_id_nesting_flag) << 2) | 3;

/* unsigned int(8) numOfArrays; */
buffer[22] = 3;

let i = 23;

for (const nalu of [videoParameterSet, sequenceParameterSet, pictureParameterSet]) {
/*
* bit(1) array_completeness;
* unsigned int(1) reserved = 0;
* unsigned int(6) NAL_unit_type;
*/
buffer[i] = nalu.nal_unit_type;
i += 1;

/* unsigned int(16) numNalus; */
buffer[i] = 0;
i += 1;
buffer[i] = 1;
i += 1;

/* unsigned int(16) nalUnitLength; */
buffer[i] = nalu.data.length >> 8;
i += 1;
buffer[i] = nalu.data.length;
i += 1;

buffer.set(nalu.data, i);
i += nalu.data.length;
}

return buffer;
}

Frames

AVCC/HEVC uses the same length-prefixed format for frame data.

The configuration packet also needs to appear in the frame stream, but not as a separate packet. It must be prepended with the first frame packet.

info

It needs the original Annex B format configuration packet, not the AvcDecoderConfigurationRecord.

let configuration: Uint8Array | undefined;

for await (const packet of videoPacketStream) {
if (packet.type === "configuration") {
configuration = packet.data;
// Also convert it to `AVCDecoderConfigurationRecord` and store in metadata
continue;
}

if (packet.type === "data") {
let buffer: Uint8Array;

if (configuration) {
const buffer = new Uint8Array(configuration.byteLength + packet.data.byteLength);
buffer.set(configuration);
buffer.set(packet.data, configuration.byteLength);
configuration = undefined;
} else {
buffer = packet.data;
}

console.log(buffer);
}
}

Then the data (maybe with configuration prepended) needs to be converted to a AVCSample structure.

import { annexBSplitNalu } from "@yume-chan/scrcpy";

function nalStreamToAvcSample(buffer: Uint8Array) {
const nalUnits: Uint8Array[] = [];
let totalLength = 0;

for (const unit of annexBSplitNalu(buffer)) {
nalUnits.push(unit);
totalLength += unit.byteLength + 4;
}

const sample = new Uint8Array(totalLength);
let offset = 0;
for (const nalu of nalUnits) {
sample[offset] = nalu.byteLength >> 24;
sample[offset + 1] = nalu.byteLength >> 16;
sample[offset + 2] = nalu.byteLength >> 8;
sample[offset + 3] = nalu.byteLength & 0xff;
sample.set(nalu, offset + 4);
offset += 4 + nalu.byteLength;
}
return sample;
}

Again, how to save the AVCSample into the video stream depends on the container format and muxer library.

Start recording anytime

The above examples assume you are saving the video stream from the beginning. If you want to start recording after user clicks a button, a WritableStream is not enough. For example, you might need to return an object with a start and stop method and a writable stream field, and only write to Mediabunny's Output after start is called.

However, that's still not enough. Because a video file must have a configuration, and start with a keyframe, you must have both before recording starts.

Before v3.0

Before v3.0, configuration packets are generated only at the beginning and after device orientation changes. Keyframes are generated only at a fixed interval. It's not possible to manually request them as you want.

The best possible approach is caching the latest configuration, and frames starting from the latest keyframe, and use them when starting recoding.

import type {
ScrcpyMediaStreamPacket,
ScrcpyMediaStreamConfigurationPacket,
ScrcpyMediaStreamDataPacket,
} from "@yume-chan/scrcpy";
import { WritableStream } from "@yume-chan/stream-extra";

declare const videoPacketStream: ReadableStream<ScrcpyMediaStreamPacket>;

let configuration: ScrcpyMediaStreamConfigurationPacket | undefined;
const frames: ScrcpyMediaStreamDataPacket[] = [];

let recording = false;
let configurationChanged = false;

const recorder = {
writable: new WritableStream<ScrcpyMediaStreamPacket>({
write(chunk) {
switch (chunk.type) {
case "configuration":
configuration = chunk;
frames.length = 0; // Clear the array

if (recording) {
configurationChanged = true;
}
break;
case "data":
if (chunk.keyframe) {
frames.length = 0;
}
frames.push(chunk);

if (recording) {
if (configurationChanged) {
// Concat `configuration` and `chunk.data` and write to muxer
configurationChanged = false;
} else {
// Write `chunk.data` to muxer
}
}
break;
}
},
}),
start() {
recording = true;

if (configuration) {
// Already have configuration

if (frame.length) {
// Concat it with first frame and write to muxer

for (const frame of frames.slice(1)) {
// Write frame to muxer
}
} else {
configurationChanged = true;
}
}
},
stop() {
// ...
},
};

Obviously, this will record several extra frames before requested.

To make it worse, because Android MediaCodec API produces keyframes every N frames (instead of every N milliseconds), and no frames will be generated when screen content is static, those extra frames may last several seconds.

v3.0 and later

Added in v3.0, the reset video control message requests the server to restart capturing and encoding immediately, which produces new configuration and keyframe packets, so the stream can be recorded from any point.

After sending the reset video control message, some frames might already in-flight before the new configuration packet arrives. Those frame packets need to be dropped.

import type {
ScrcpyControlMessageWriter,
ScrcpyMediaStreamConfigurationPacket,
ScrcpyMediaStreamPacket,
} from "@yume-chan/scrcpy";
import { WritableStream } from "@yume-chan/stream-extra";

declare const controller: ScrcpyControlMessageWriter;

let configuration: ScrcpyMediaStreamConfigurationPacket | undefined;

let starting = false;
let recording = true;

const recorder = {
writable: new WritableStream<ScrcpyMediaStreamPacket>({
write(chunk) {
switch (chunk.type) {
case "configuration":
configuration = chunk;

// Wait until next configuration to really start recording
if (starting) {
recording = true;
}
break;
case "data":
if (recording) {
if (configuration) {
// Concat `configuration` and `chunk.data` and write to muxer
configuration = undefined;
} else {
// Write `chunk.data` to muxer
}
}
break;
}
},
}),
async start() {
starting = true;

// Request Scrcpy server to produce a new configuration and keyframe
await controller.resetVideo();
},
stop() {
// ...
},
};