そりゃ、私だって、「ナンシー」をゼロから自力で作りたいとは思っています ―― これでも、エンジニアの端くれですからね。

ebata

4か月前

「生成AIを業務に活用する研究開発」が、今がピークなのか、これから伸びていくのか分かりませんが、今は、結構な件数が走っています。

I don't know whether “research and development on using generative AI for business” is at its peak now or whether it will continue to grow, but quite a few projects are currently running.

私には『生成AIって、業務ツールのライブラリのようなものとして使うようなものかな？』という疑問があります。

I have a question: 'Is a generative AI something you would use as a library of business tools?

「生成AI」の最大の特徴は、「人間とのインタラクティブな会話テイスト」にある、と感じています。

The most significant feature of “Generative AI” is its “interactive conversation style with humans.”

ノンアルコールビールの表示に、「ビールテイスト飲料」と記載されているような感じですね。

It's like how non-alcoholic beer is labeled as a “beer-tasting beverage.”

もちろん、生成AIの特徴は、単なる「インタラクティブな会話」だけではありません。

Of course, the feature of generative AI is not just “interactive conversation.”

テキスト、画像、音声などで用いられているものも含めて、明確な呼び名がある生成AIのコア技術や要素を挙げると、ざっとこんな感じです(私のためのメモです)。

Here is a rough list of core technologies and elements of generative AI with clear names, including those used in text, images, and audio (this is a memo for me).

■ GPT（Generative Pre-trained Transformer）: 大規模言語モデルの代表例で、テキスト生成に使用

- GPT (Generative Pre-trained Transformer): A representative example of a large-scale language model used for text generation

■ BERT（Bidirectional Encoder Representations from Transformers）: 双方向にテキストを理解するモデルで、自然言語処理タスクで広く利用

- BERT (Bidirectional Encoder Representations from Transformers): A model that understands text in both directions and is widely used in natural language processing tasks.

■ T5（Text-to-Text Transfer Transformer）: テキスト生成や翻訳、要約など多用途に使える生成AIモデル

- T5 (Text-to-Text Transfer Transformer): A generative AI model that can be used for various purposes, such as text generation, translation, and summarization.

■ DALL・E: テキストから画像を生成するモデル

- DALL-E: A model that generates images from text

■ Stable Diffusion: テキストプロンプトから画像を生成する拡散モデル

- Stable Diffusion: A diffusion model that generates images from text prompts

■ StyleGAN: GANを基にした画像生成モデルで、高精細な画像生成に特化

- StyleGAN: An image generation model based on GAN, specializing in high-definition image generation

■ VQ-VAE（Vector Quantized Variational Autoencoder）: 画像や音声の生成に使われるオートエンコーダーモデル

- VQ-VAE (Vector Quantized Variational Autoencoder): An autoencoder model used for generating images and sounds

■ Turing-NLG: Microsoftが開発したテキスト生成用の大規模言語モデル

- Turing-NLG: Large-scale language model for text generation developed by Microsoft

■ Claude: Anthropic社が開発した大規模言語モデル

- Claude: Large-scale language model developed by Anthropic

■ LaMDA（Language Model for Dialogue Applications）: Googleが開発した対話特化型の大規模言語モデル

- LaMDA (Language Model for Dialogue Applications): A large-scale language model developed by Google that is specialized for dialogue

■ Codex: プログラムコード生成に特化したモデルで、GPT-3ベース

- Codex: A model specialized in program code generation based on GPT-3

■ Whisper: 音声認識と文字起こしに特化した生成AIモデル

- Whisper: A generative AI model specialized in speech recognition and transcription

これからも、生成AIの増えていくと思うので、こういう用語の羅列には、あまり意味はないかもしれませんが、とりあえず、これらの用語を頭に入れておかないと「ついていけない」 ―― というか「おいていかれる」と、強く感じます。

I think that the number of generative AIs will continue to increase, so there may not be much point in listing terms like this, but for now, I feel that if I don't keep these terms in mind, I won't be able to keep up - or rather, I'll be left behind.

-----

エンジニアにとって『分かったフリ』という『ハッタリ』は、芸の一つです。

For engineers, pretending to understand is a form of bluffing.

そして、そういう『分かったフリ』から『分かった』に至ることが結構な頻度で発生するのも、エンジニアという仕事の特徴でもあります。

The fact that it is pretty standard to go from pretending to understand to understanding is also a characteristic of the work of an engineer.

『分かったフリ』をすると、仕事として投げつけられて、結局、逃げられなくなるからです ―― あまり楽しい話ではありませんが、最新技術をせっせと入荷し、加工して、販売し続けないと、エンジニアという店舗の運営はできないです。

If I pretend to understand, I'll end up being thrown work at and unable to escape - it's not a very pleasant story. Still, if I don't keep working hard to bring in the latest technology, process it, and sell it, I won't be able to run a store as an engineer.

閑話休題

Leaving that aside

-----

このように考えていくと、『「生成AI」の最大の特徴が「人間とのインタラクティブな会話テイスト」にあるようなもの』と思っている私は、単なる生成AIの消費者(ユーザ)であり、「生成AIの開発者」という自覚を持っていない、ということです。

When I think about it like this, I realize that I am just a consumer (user) of generative AI and that I don't have an awareness of being a “generative AI developer” because I think that “the greatest feature of generative AI is its interactive conversational style with humans.”

"モノを作ってナンボ"を信条としているエンジニアである私にとっては、良くない傾向です。(私は、TCP/IPのプロトコルスタック(プロト)を自作したこともあります(一般には無駄なことです))

As an engineer who believes that “it's all about making things,” this is a nasty tendency. (I've even made my own TCP/IP protocol stack (prototype). Generally, this is a waste of time.)

ただ、今から「生成AIの開発者」っていうのは、ちょっとキツいなぁ、と感じています。

However, “developer of generative AI” is a bit harsh.

なぜなら、生成AIは、"小規模"に作れないのからです。

This is because generative AI cannot be created on a “small scale.”

LLM（大規模言語モデル）が扱う単語数は、数十万から数百万単語と言われていますし、単語の一部や記号、文法的な要素を含めた「トークン」は、今軽く5万を越えているといわれています。

It is said that LLM (Large Language Models) handles hundreds of thousands to millions of words, and the number of “tokens” that include parts of words, symbols, and grammatical elements is currently over 50,000.

週末エンジニアの手に負える範囲を、軽く越えています。

It's easily beyond the scope that a weekend engineer can handle.

-----

という訳で ――

So ――.

「個人でゼロから作る生成AI」のような文献や書籍があったら、是非ご紹介下さい。

If you know of any literature or books on “AI that is created from scratch by an individual,” please introduce them to me.

そりゃ、私だって、「ナンシー」をゼロから自力で作りたいとは思っています ―― これでも、エンジニアの端くれですからね。

Of course, I'd like to make “Nancy” from scratch, too - I'm an engineer.

（↑クリックすると、「Over the AI ――AIの向こう側に」の連載記事一覧に飛びます)

(Clicking on the above will take you to a list of articles in the series “Over the AI - On the Other Side of AI”)

生成AIは、とても役に立つけど、『たまに、とんでもなくマヌケであって欲しい』と思うのです。